Progress of the Herd -The Apache Hadoop Ecosystem

The last couple of weeks have been a period of intense activity around the Apache projects that comprise the Hadoop ecosystem. While most of the headlines were accorded to Apache Hadoop 2 going GA, it would be remiss not to pay attention to the great progress being made in the Apache projects that complement Hadoop.

We have blogged about these over the course of the past week and the list below provides a quick summary of the phenomenal work contributed in the open by the folks driving these diverse and vital communities.

Apache Hadoop 2.2.0

hadoopRelease Date: 15th October, 2013

Over five years in the making, Hadoop 2 represents a major, stable release off the mainline trunk and represents the future of this important project. While there are many new features in the 2.0 GA release, here are some highlights:

  • YARN

  • High Availability for HDFS

  • HDFS Federation

  • HDFS Snapshots

  • NFSv3 access to data in HDFS

  • Binary Compatibility for MapReduce applications between Hadoop v1 and Hadoop v2 to ease migration

  • Performance

  • Support for running Hadoop on Microsoft Windows

  • Integration testing for the entire Apache Hadoop ecosystem at the ASF.

Apache HBase 0.96

hbase_logoRelease Date: 18th October, 2013

With 2,134 Jira tickets closed HBase 0.96 represents 14 months of development and a major step forward for this important project.  Some of the notable new features include:

  • Reduced Mean Time to Recovery (MTTR)

  • Snapshots for HBase tables

  • Support for Microsoft Windows

  • Compaction Improvements

  • Wire Compatibility with Protocol Buffers

  • Data Type flexibility

  • Overhaul of Metrics framework

Apache Hive 0.12

hive_logoRelease Date: 15th October, 2013

Only five months in the making, Apache Hive 0.12 comprises over 420 closed JIRA tickets contributed by ten companies, with nearly 150.000 lines of code!  Below are some highlights of the release, which represents delivery of phase 2 of the Stinger initiative.

  • Faster query planning including Metastore improvements

  • Predicate pushdown for ORC files

  • Performance enhancements such as parallel ORDER BY, LIMIT pushdown etc.

  • Enhanced SQL with support for VARCHAR, DATE and macros

Apache Ambari (Incubating) 1.4.1

apache-ambari-projectRelease Date: 21st October, 2013

Over 760 JIRAs have been resolved the Ambari 1.4.1 release by more than 40 engineers.  The following are the highlights of the Apache Ambari 1.4.1 release this week:

  • Support for Apache Hadoop 2 stack including Apache HBase 0.96, Apache Pig 0.12, Apache Hive 0.12 and Apache Oozie 4.0

  • Support for High Availability for HDFS NameNode

  • Added support for enabling Kerberos security for Hadoop 2

  • Support to work with SSL enabled Hadoop daemons

  • Support to work with web authentication enabled for Hadoop daemons

  • Added support for JDK 7 (and maintained support for JDK 6)

Apache Pig 0.12


pigRelease Date: 14th October, 2013

Last week saw the release of Apache Pig 0.12. Some of the highlights of this release include:

  • Support for new ASSERT, IN, CASE operators

  • Streaming UDFs

  • Support for BigInteger and BigDecimal data-types

  • Support for Microsoft Windows

Apache Oozie 4.0.0

oozieRelease Date: 30th August, 2013

And last but not least, the Apache Oozie community remade their 4.0 release:

  • Support strict SLAs

  • Support for HCatalog

Acknowledgements

As always, it’s an honor and pleasure to innovate with the entire Apache Hadoop community – thanks to everyone who contributed!

Categorized by :
Administrator Ambari Architect & CIO Data Analyst & Scientist Developer Hadoop 2.0 Hadoop Ecosystem HBase HCatalog HDFS Hive MapReduce Oozie Pig YARN

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Try it with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.

Get Sandbox

Recently in the Blog

Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.

Thank you for subscribing!