From the Dev Team

Follow the latest developments from our technical team

Apache YARN, Apache Slider, and Docker

Join us June 19 at 6 pm at the Hilton Fort Worth, Texas for an educational workshop hosted by Hortonworks and Sendero Business Services. The topic is “The Key To Success is Consistently Making Good Decisions & The Key To Good Decisions is Good Information.” The speaker is Don Hilborn, Solutions Engineer at Hortonworks.

Don will introduce the paradigm of

  • Efficiency – double processing in Hadoop on the same hardware while providing predictable performance and quality of service; and
  • Resource sharing – providing a stable common set of shared resources across multiple, coordinated workloads in Hadoop.

This is the second in the series of blogs exploring how to write data-driven applications in Java using the Cascading SDK. The series are:

  • WordCount
  • Log Parsing
  • Historically, programming languages and software frameworks have evolved in a singular direction, with a singular purpose: to achieve simplicity, hide complexity, improve developer productivity, and make coding easier. And in the process, foster innovation to the degree we have seen today—and benefited from.

    Anyone among you is “young” enough to admit writing code in microcode and assembly language?…

    With the release of Apache Hadoop YARN in October of last year, organizations are moving from single-application Hadoop clusters to a versatile, integrated Hadoop 2 data platform hosting multiple applications — eliminating silos, maximizing resources and bringing true multi-workload capabilities to Hadoop.  Many enterprises have adopted YARN as the architectural center of a set of integrated technologies and capabilities that form the blueprint for enterprise Hadoop.

    YARN Enabling the Ecosystem Technologies

    Hortonworks is making it easier to develop YARN applications through a number of technologies. …

    Introduced in 2008, Apache Hive has been the de-facto SQL solution in Hadoop. By 2012, SQL had become a key battleground for Hadoop and many vendors started to publish benchmarks showing massive performance advantages their solutions had over Hive. Each of these vendors predicted that Hive would eventually be supplanted by the proprietary solution they were pushing.

    The concerns about Hive’s performance were real. Hadoop in 2012 was a purely batch platform and no work had ever been done within Hive to address low-latency or interactive workloads.…

    A significant reason for the increased adoption of the Hortonworks Data Platform by customers and partners has been Apache Hadoop YARN. This major advance, introduced last October in HDP2, allows Hadoop to move from many single-purpose clusters to a versatile, integrated data platform that hosts multiple business applications.

    YARN has become the architectural center of Hadoop. We intend to make it easier for applications to work in a YARN environment, and benefit from the integrated capabilities and technologies that form the blueprint for enterprise Hadoop.…

    Apache Ambari has always provided an operator the ability to provision an Apache Hadoop cluster using an intuitive Cluster Install Wizard web interface, guiding the user through a series of steps:

    • confirming the list of hosts
    • assigning master, slave, and client components to configuring services, and
    • installing, starting and testing the cluster.

    With Ambari Blueprints, system administrators and dev-ops engineers can expedite the process of provisioning a cluster. Once defined, Blueprints can be re-used, which facilitates easy configuration and automation for each successive cluster creation.…

    We recently hosted the fourth of our seven Discover HDP 2.1 webinars, entitled Apache 2.4.0, HDFS and YARN. It was very well attended and a very informative discourse. The speakers outlined the new features in YARN and HDFS in HDP 2.1 including:

    • HDFS Extended ACLs
    • HTTPs support for WebHDFS and for the Hadoop web UIs
    • HDFS Coordinated DataNode Caching
    • YARN Resource Manager High Availability
    • Application Monitoring through the YARN Timeline Server
    • Capacity Scheduler Preemption

    Many thanks to our presenters, Rohit Bakhshi (Hortonworks’ senior product manager), Vinod Kumar Vavilapalli (co-author of the YARN Book, PMC, Hadoop YARN Project Lead at Apache and Hortonworks), and Justin Sears (Hortonworks’ Product Marketing Manager).…

    Traditionally, HDFS, Hadoop’s storage subsystem, has focused on one kind of storage medium, namely spindle-based disks.  However, a Hadoop cluster can contain significant amounts of memory and with the continued drop in memory prices, customers are willing to add memory targeted at caching storage to speed up processing.

    Recently HDFS generalized its architecture to include other kinds of storage media including SDDs and memory [1]. We also added support for caching hot files in memory [2].…

    Julian Hyde will present the following talks at the Hadoop Summit:

  • Discardable In-Memory, Materialized Query for Hadoop,”  (June 3rd, 11:15-11:55 am)
  • “Cost-based Query Optimization in Hive,” (June 4th,  4:35 pm-5:15 pm)
  • What to do with all that memory in a Hadoop cluster? The question is frequently heard. Should we load all of our data into memory to process it? Unfortunately the answer isn’t quite that simple.

    The goal should be to put memory into its right place in the storage hierarchy, alongside disk and solid-state drives (SSD).…

    The Apache Ambari community is happy to announce last week’s release of Apache Ambari 1.6.0, which includes exciting new capabilities and resolves 288 JIRA issues.  

    Many thanks to all of the contributors in the Apache Ambari community for the collaboration to deliver 1.6.0, especially with Blueprints, a crucial feature that enables rapid instantiation and replication of clusters.

    Each release of Ambari makes substantial strides in providing functionality to simplify the lives of system administrators and dev-ops engineers to deploy, manage, and monitor large Hadoop clusters, including those running on Hortonworks Data Platform 2.1 (HDP).…

    On Wednesday May 21, Himanshu Bari (Hortonworks’ senior product manager), Venkatesh Seetharam (committer to Apache Falcon), and Justin Sears ( Hortonworks’ Product Marketing Manager), hosted the third of our seven Discover HDP 2.1 webinars. Himanshu and Venkatesh discussed data governance in Hadoop through Apache Falcon that is included in HDP 2.1. As most of you know, ingesting data into Hadoop is one thing; having data governed, by dictating and defining data-pipeline policies, is another thing—a necessity in the enterprise.…

    According to New York Observer, there were couple of major social reasons that spurred the genesis and growth of Meetup.com. First, it was Robert Putman’s book Bowling Alone, in which he talks about the collapse of communities in America. And the second was an event that not only changed the world but changed New York: it was the aftermath of September 11, where strangers cared about greeting, meeting, and talking.…

    On May 15, Owen O’Malley and Carter Shanklin hosted the second of our seven Discover HDP 2.1 webinars. Owen and Carter discussed the Stinger Initiative and the improvements to Apache Hive that are included in HDP 2.1:

    • Faster queries with Hive on Tez, vectorized query execution and a cost-based optimizer
    • New SQL semantics and datatypes
    • SQL-standard authorization
    • The Hive job visualizer in Apache Ambari
    • And many more

    Here is the complete recording of the webinar.…

    Last week Vinay Shukla and Kevin Minder hosted the first of our seven Discover HDP 2.1 webinars. Vinay and Kevin covered three important topics related to new Apache Hadoop security features in HDP 2.1:

    • REST API security with Apache Knox Gateway
    • HDFS security with Access Control Lists (ACLs)
    • SQL security and next-generation Hive authorization

    Here is the complete recording of the webinar.

    Here are the presentation slides: http://www.slideshare.net/hortonworks/discoverhdp21security

    Attend our next Discover HDP 2.1 webinar tomorrow, Thursday, May 15 at 10am Pacific Time: Interactive SQL Query in Hadoop with Apache Hive

    We’re grateful to the many participants who joined and asked excellent questions.…

    I’m a pretty heavy Unix user and I tend to prefer doing things the Unix Way™, which is to say, composing many small command line oriented utilities. With composability comes power and with specialization comes simplicity. Although, sometimes if two utilities are used all the time, sometimes it makes sense for either:

    • A utility that specializes in a very common use-case
    • One utility to provide basic functionality from another utility

    For example, one thing that I find myself doing a lot of is searching a directory recursively for files that contain an expression:

    Despite the fact that you can do this, specialized utilities, such as ack have come up to simplify this style of querying.…

    Go to page:12345...10...Last »