Announcing the release of Hortonworks Data Platform 2.0 Beta

Another week, another release…  Following the release of Apache Hadoop 2.0 beta last week, we are excited to release the beta of Hortonworks Data Platform 2.0, the first commercial release of the stable YARN API and protocols on which new applications can now be built.

HDP 2.0 BetaFor our customers this is a great opportunity to ensure the release meets expectations and provides a vehicle to voice feedback that will work to improve Hadoop and shape its roadmap.  For our partners, they will be able to test their applications on a stable, reliable v2 of Hadoop.  Further, This means that applications besides MapReduce such as Apache Tez, HBase on YARN (HOYA), Storm-on-YARN etc. can use a commercial distribution to test against.

Hadoop 2.0

This release contains the most recent version of Hadoop 2.0, which was called beta during the last week itself.  Some of the key features under HDP beta test will include the following:

  • Hadoop Core
    • MapReduce in YARN
      Now ready for prime time! YARN enables the compute layer in Hadoop to be extended to workloads that go beyond batch processing. Also, all your MapReduce applications written in HDP 1.x are forward compatible with YARN in HDP 2.x
    • Namenode HA with hot failover
      While this feature is also available in the HDP 1.x line, it no longer requires shared storage and third party clustering solutions. With HDP 2.0, NameNode HA with hot failover is possible in native Hadoop. Quorum Journal Manager nodes help propagate every namespace change to the standby NameNode and thus remove the need for shared storage. Also, ZooKeeper failover controller nodes automatically detect planned or unplanned NameNode failures and thus remove the need to use any third party clustering solution.
    • NFS mount capability for HDFS
      You can mount the HDFS cluster as a volume on client machines and use native command line, scripts or file explorer UI to view HDFS files and load data into HDFS.  NFS thus enables file-based applications to perform file read and write operations directly to Hadoop. This greatly simplifies data management in Hadoop and expands the integration of Hadoop into existing toolsets
    • Provision, Manage and Monitor
      With this release, Ambari adds tons of improvements & the ability to install & configure Hadoop 2 components such as YARN and MapReduce2.
    • Stinger improvements for performance & SQL compatibility with Apache Hive
      Based on the concepts in Y-Smart, Hive now has a new logical optimizer called ‘Correlation optimizer’ that speeds up processing by merging correlated mapreduce jobs into a single mapreduce job. Also ORC files now support predicate pushdown to significantly speed up precise queries by skipping more than 10K rows at a time that don’t satisfy the predicate.
    • HBase Compaction and new Data Types
      HDP 2.0 includes HBase 0.96 which has enhancements like improved compactions and additional datatype support. Given the nature of use cases HBase addresses, reducing recovery times becomes extremely important. One key feature in HBase 0.96 is the ability to recover from failures in seconds. This is achieved using the sub second master process and region server failure detection capability.
    • Oozie Scheduling Improvements
      Improved crontab style scheduling with options like run jobs on first or last day of month or quarter and on a particular hour on a weekday

Stay Tuned… There’s More!

Over the course of the next few weeks, we have a series of blog posts lined up with details on each of these features and more.  We’ll also provide regular status of the beta program, software updates and general insight into status of 2.0!

Categorized by :
Apache Hadoop Architecture Hadoop 2.0 HBase HDFS HDP 2 Hive Performance YARN

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Try it with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.

Get Sandbox

Stinger Initiative

The Stinger Initiative is a broad, community-based effort to drive the future of Apache Hive, delivering 100x performance improvements at petabyte scale with familiar SQL semantics. More »

Recently in the Blog

Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.

Thank you for subscribing!