Another week, another release… Following the release of Apache Hadoop 2.0 beta last week, we are excited to release the beta of Hortonworks Data Platform 2.0, the first commercial release of the stable YARN API and protocols on which new applications can now be built.
For our customers this is a great opportunity to ensure the release meets expectations and provides a vehicle to voice feedback that will work to improve Hadoop and shape its roadmap. For our partners, they will be able to test their applications on a stable, reliable v2 of Hadoop. Further, This means that applications besides MapReduce such as Apache Tez, HBase on YARN (HOYA), Storm-on-YARN etc. can use a commercial distribution to test against.
This release contains the most recent version of Hadoop 2.0, which was called beta during the last week itself. Some of the key features under HDP beta test will include the following:
- Hadoop Core
- MapReduce in YARN
Now ready for prime time! YARN enables the compute layer in Hadoop to be extended to workloads that go beyond batch processing. Also, all your MapReduce applications written in HDP 1.x are forward compatible with YARN in HDP 2.x
- Namenode HA with hot failover
While this feature is also available in the HDP 1.x line, it no longer requires shared storage and third party clustering solutions. With HDP 2.0, NameNode HA with hot failover is possible in native Hadoop. Quorum Journal Manager nodes help propagate every namespace change to the standby NameNode and thus remove the need for shared storage. Also, ZooKeeper failover controller nodes automatically detect planned or unplanned NameNode failures and thus remove the need to use any third party clustering solution.
- NFS mount capability for HDFS
You can mount the HDFS cluster as a volume on client machines and use native command line, scripts or file explorer UI to view HDFS files and load data into HDFS. NFS thus enables file-based applications to perform file read and write operations directly to Hadoop. This greatly simplifies data management in Hadoop and expands the integration of Hadoop into existing toolsets
- Provision, Manage and Monitor
With this release, Ambari adds tons of improvements & the ability to install & configure Hadoop 2 components such as YARN and MapReduce2.
- Stinger improvements for performance & SQL compatibility with Apache Hive
Based on the concepts in Y-Smart, Hive now has a new logical optimizer called ‘Correlation optimizer’ that speeds up processing by merging correlated mapreduce jobs into a single mapreduce job. Also ORC files now support predicate pushdown to significantly speed up precise queries by skipping more than 10K rows at a time that don’t satisfy the predicate.
- HBase Compaction and new Data Types
HDP 2.0 includes HBase 0.96 which has enhancements like improved compactions and additional datatype support. Given the nature of use cases HBase addresses, reducing recovery times becomes extremely important. One key feature in HBase 0.96 is the ability to recover from failures in seconds. This is achieved using the sub second master process and region server failure detection capability.
- Oozie Scheduling Improvements
Improved crontab style scheduling with options like run jobs on first or last day of month or quarter and on a particular hour on a weekday
Stay Tuned… There’s More!
Over the course of the next few weeks, we have a series of blog posts lined up with details on each of these features and more. We’ll also provide regular status of the beta program, software updates and general insight into status of 2.0!