From the Dev Team

Follow the latest developments from our technical team

Julian Hyde will present the following talks at the Hadoop Summit:

  • Discardable In-Memory, Materialized Query for Hadoop,”  (June 3rd, 11:15-11:55 am)
  • “Cost-based Query Optimization in Hive,” (June 4th,  4:35 pm-5:15 pm)
  • What to do with all that memory in a Hadoop cluster? The question is frequently heard. Should we load all of our data into memory to process it? Unfortunately the answer isn’t quite that simple.

    The goal should be to put memory into its right place in the storage hierarchy, alongside disk and solid-state drives (SSD).…

    The Apache Ambari community is happy to announce last week’s release of Apache Ambari 1.6.0, which includes exciting new capabilities and resolves 288 JIRA issues.  

    Many thanks to all of the contributors in the Apache Ambari community for the collaboration to deliver 1.6.0, especially with Blueprints, a crucial feature that enables rapid instantiation and replication of clusters.

    Each release of Ambari makes substantial strides in providing functionality to simplify the lives of system administrators and dev-ops engineers to deploy, manage, and monitor large Hadoop clusters, including those running on Hortonworks Data Platform 2.1 (HDP).…

    On Wednesday May 21, Himanshu Bari (Hortonworks’ senior product manager), Venkatesh Seetharam (committer to Apache Falcon), and Justin Sears ( Hortonworks’ Product Marketing Manager), hosted the third of our seven Discover HDP 2.1 webinars. Himanshu and Venkatesh discussed data governance in Hadoop through Apache Falcon that is included in HDP 2.1. As most of you know, ingesting data into Hadoop is one thing; having data governed, by dictating and defining data-pipeline policies, is another thing—a necessity in the enterprise.…

    According to New York Observer, there were couple of major social reasons that spurred the genesis and growth of Meetup.com. First, it was Robert Putman’s book Bowling Alone, in which he talks about the collapse of communities in America. And the second was an event that not only changed the world but changed New York: it was the aftermath of September 11, where strangers cared about greeting, meeting, and talking.…

    On May 15, Owen O’Malley and Carter Shanklin hosted the second of our seven Discover HDP 2.1 webinars. Owen and Carter discussed the Stinger Initiative and the improvements to Apache Hive that are included in HDP 2.1:

    • Faster queries with Hive on Tez, vectorized query execution and a cost-based optimizer
    • New SQL semantics and datatypes
    • SQL-standard authorization
    • The Hive job visualizer in Apache Ambari
    • And many more

    Here is the complete recording of the webinar.…

    Last week Vinay Shukla and Kevin Minder hosted the first of our seven Discover HDP 2.1 webinars. Vinay and Kevin covered three important topics related to new Apache Hadoop security features in HDP 2.1:

    • REST API security with Apache Knox Gateway
    • HDFS security with Access Control Lists (ACLs)
    • SQL security and next-generation Hive authorization

    Here is the complete recording of the webinar.

    Here are the presentation slides: http://www.slideshare.net/hortonworks/discoverhdp21security

    Attend our next Discover HDP 2.1 webinar tomorrow, Thursday, May 15 at 10am Pacific Time: Interactive SQL Query in Hadoop with Apache Hive

    We’re grateful to the many participants who joined and asked excellent questions.…

    I’m a pretty heavy Unix user and I tend to prefer doing things the Unix Way™, which is to say, composing many small command line oriented utilities. With composability comes power and with specialization comes simplicity. Although, sometimes if two utilities are used all the time, sometimes it makes sense for either:

    • A utility that specializes in a very common use-case
    • One utility to provide basic functionality from another utility

    For example, one thing that I find myself doing a lot of is searching a directory recursively for files that contain an expression:

    Despite the fact that you can do this, specialized utilities, such as ack have come up to simplify this style of querying.…

    The first use of the term BoF session was used at the Digital Equipment Users’ Society (DECUS) conference in the 1960s. Its essence was to bring together like minds and thought leaders—just as birds of the feather flock together— to share and exchange computing ideas, in an informal yet spirited way. Since then, the organizers and sponsors of most computing conferences have been loyal to its essence and spirit.

    For ideas and innovation happen in collaboration—not in isolation. …

    This is the second in our series on the motivations and architecture for improvements to the Apache Hadoop YARN’s Resource Manager Restart resiliency. Other in the series are:

    Introduction: Phase I – Preserve Application-queues

    In the introductory blog, we previewed what RM Restart Phase I entails. In essence, we preserve the application-queue state into a persistent store and reread it upon RM restart, eliminating the need for users to resubmit their applications.…

    This is the first post in our series on the motivations and architecture for improvements to the Apache Hadoop YARN’s Resource Manager Restart resiliency. Other in the series are:

    Resource Manager (RM) is the central authority of Apache Hadoop YARN for resource management and scheduling. It is responsible for allocation of resources to applications like Hadoop MapReduce jobs, Apache TEZ DAGs, and other applications running atop YARN.…

    Hortonworks Data Platform 2.1 for Windows is the 100% open source data management platform based on Apache Hadoop and available for the Microsoft Windows Server platform. I have built a helper tool that automates the process of deploying a multi-node Hadoop cluster – utilizing the MSI available in HDP 2.1 for Windows.

    Download HDP 2.1 for Windows

    HDP on Windows MSI Overview

    HDP on Windows installation package comes in the format of MSI, Microsoft’s MSI format utilizes the installation and configuration service provided with Windows called Windows Installer.…

    The Apache Knox Gateway team is pleased to announce Knox’s first release as an Apache top-level project: Apache Knox Gateway 0.4.0. The team resolved approximately 100 JIRAs for this release and Knox Gateway is now better positioned to provide complete security for REST API access to a Hadoop cluster.

    The new features in Knox Gateway 0.4.0 are the features that enterprise security officers expect in a gateway solution:

    • Perimeter security for a Hadoop cluster
    • Support for enterprise group lookup
    • Audit log of all gateway activity
    • Command line tooling for CMF provisioning
    • Protection for web application vulnerabilities
    • Pre-authentication via SSO token
    • And many more…

    As a top-level project, Apache Knox Gateway is fully endorsed by the Apache Software Foundation, and this improves coordination between development of Knox and the other core Hadoop projects with which it interacts.…

    Yesterday the Apache Ambari community proudly released version 1.5.1. This is the result of constant, concerted collaboration among the Ambari project’s many members. This release represents the work of over 30 individuals over 5 months and, combined with the Ambari 1.5.0 release, resolves more than 1,000 JIRAs.

    This version of Ambari makes huge strides in simplifying the deployment, management and monitoring of large Hadoop clusters, including those running Hortonworks Data Platform 2.1.…

    The Apache Hive community has voted on and released version 0.13 today. This is a significant release that represents a major effort from over 70 members who worked diligently to close out over 1080 JIRA tickets.

    Hive 0.13 also delivers the third and final phase of the Stinger Initiative, a broad community based initiative to drive the future of Apache Hive, delivering 100x performance improvements at petabyte scale with familiar SQL semantics.…

    The power of a well-crafted speech is indisputable, for words matter—they inspire to act. And so is the power of a well-designed Software Development Kit (SDK), for high-level abstractions and logical constructs in a programming language matter—they simplify to write code.

    In 2007, when Chris Wensel, the author of Cascading Java API, was evaluating Hadoop, he had a couple of prescient insights. First, he observed that finding Java developers to write Enterprise Big Data applications in MapReduce will be difficult and convincing developers to write directly to the MapReduce API was a potential blocker.…

    Go to page:12345...10...Last »