Announcing the release of Hortonworks Data Platform 2.0 Beta

Another week, another release…  Following the release of Apache Hadoop 2.0 beta last week, we are excited to release the beta of Hortonworks Data Platform 2.0, the first commercial release of the stable YARN API and protocols on which new applications can now be built.

HDP 2.0 BetaFor our customers this is a great opportunity to ensure the release meets expectations and provides a vehicle to voice feedback that will work to improve Hadoop and shape its roadmap.  For our partners, they will be able to test their applications on a stable, reliable v2 of Hadoop.  Further, This means that applications besides MapReduce such as Apache Tez, HBase on YARN (HOYA), Storm-on-YARN etc. can use a commercial distribution to test against.

Hadoop 2.0

This release contains the most recent version of Hadoop 2.0, which was called beta during the last week itself.  Some of the key features under HDP beta test will include the following:

  • Hadoop Core
    • MapReduce in YARN
      Now ready for prime time! YARN enables the compute layer in Hadoop to be extended to workloads that go beyond batch processing. Also, all your MapReduce applications written in HDP 1.x are forward compatible with YARN in HDP 2.x
    • Namenode HA with hot failover
      While this feature is also available in the HDP 1.x line, it no longer requires shared storage and third party clustering solutions. With HDP 2.0, NameNode HA with hot failover is possible in native Hadoop. Quorum Journal Manager nodes help propagate every namespace change to the standby NameNode and thus remove the need for shared storage. Also, ZooKeeper failover controller nodes automatically detect planned or unplanned NameNode failures and thus remove the need to use any third party clustering solution.
    • NFS mount capability for HDFS
      You can mount the HDFS cluster as a volume on client machines and use native command line, scripts or file explorer UI to view HDFS files and load data into HDFS.  NFS thus enables file-based applications to perform file read and write operations directly to Hadoop. This greatly simplifies data management in Hadoop and expands the integration of Hadoop into existing toolsets
    • Provision, Manage and Monitor
      With this release, Ambari adds tons of improvements & the ability to install & configure Hadoop 2 components such as YARN and MapReduce2.
    • Stinger improvements for performance & SQL compatibility with Apache Hive
      Based on the concepts in Y-Smart, Hive now has a new logical optimizer called ‘Correlation optimizer’ that speeds up processing by merging correlated mapreduce jobs into a single mapreduce job. Also ORC files now support predicate pushdown to significantly speed up precise queries by skipping more than 10K rows at a time that don’t satisfy the predicate.
    • HBase Compaction and new Data Types
      HDP 2.0 includes HBase 0.96 which has enhancements like improved compactions and additional datatype support. Given the nature of use cases HBase addresses, reducing recovery times becomes extremely important. One key feature in HBase 0.96 is the ability to recover from failures in seconds. This is achieved using the sub second master process and region server failure detection capability.
    • Oozie Scheduling Improvements
      Improved crontab style scheduling with options like run jobs on first or last day of month or quarter and on a particular hour on a weekday

Stay Tuned… There’s More!

Over the course of the next few weeks, we have a series of blog posts lined up with details on each of these features and more.  We’ll also provide regular status of the beta program, software updates and general insight into status of 2.0!

Categorized by :
Apache Hadoop HBase HDFS HDP 2 Hive YARN

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Try it with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.

Get Sandbox

Join the Webinar!

Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Thursday, November 6, 2014
1:00 PM Eastern / 12:00 PM Central / 11:00 AM Mountain / 10:00 AM Pacific

More Webinars »

HDP 2.1 Webinar Series
Join us for a series of talks on some of the new enterprise functionality available in HDP 2.1 including data governance, security, operations and data access :
Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.
Explore Technology Partners
Hortonworks nurtures an extensive ecosystem of technology partners, from enterprise platform vendors to specialized solutions and systems integrators.