Expanding the Apache Hadoop Community to Windows

This post co-authored by Arun Murthy.

It’s been an exciting time for the Apache Hadoop community with new and innovative projects happening around performance (Apache Tez) — part of the Stinger initiative — and security (Apache Knox). In addition Hortonworks recently announced the availability of the beta version of Hortonworks Data Platform for Windows.

One of the things we believe strongly in here at Hortonworks is community driven open source and, obviously, the bigger the community, the better. The community opens itself up to new members by the developmental choices it makes and last week the Apache Hadoop community voted to significantly expand itself by agreeing to accept enhancements into the core trunk that make Apache Hadoop run natively on the Microsoft Windows platforms including Windows Server and Windows Azure. These enhancements were the result of many, many months of joint engineering work from Microsoft and Hortonworks and we are glad to see the community accept and embrace them. So far, as is common in the Apache Hadoop project, we developed these in a development branch for over a year and once this work was complete, the community voted to incorporate these changes into the mainline trunk.

Here are the highlights of the work done:

  • Command-line scripts for the Hadoop surface area
  • Mapping the HDFS permissions model to Windows
  • Abstracted and reconciled mismatches around differences in path semantics in Java and Windows
  • Native Task Controller for Windows
  • Implementation of a Block Placement Policy to support cloud environments, more specifically Windows Azure.
  • Implementation of Hadoop native libraries for Windows (compression codecs, native I/O)
  • Several reliability issues, including race-conditions, intermittent test failures, resource leaks.
  • Several new unit test cases written for the above changes

This is great news for the Apache Hadoop ecosystem because it enables a whole new swath of organizations using Microsoft Windows and, equally importantly, end-users to work with Apache Hadoop in their preferred environment. There is also the substantial ecosystem of technology vendors who build solutions for the Microsoft Windows platform who can now integrate their solutions on Windows. Additionally the system integrators who have invested and created expertise around the Windows platform will be able to extend their skills to Hadoop on Windows.

Of course it is also a great demonstration of contributing back to the community so that anyone can benefit from this work. It is also notable that our collaborative efforts with Microsoft also extend beyond core Apache Hadoop to projects like Apache Hive, Apache Pig, Apache Sqoop, Apache Oozie, Apache HCatalog and Apache HBase.

We at Hortonworks would like to extend our congratulations to Microsoft for giving back to the Apache Hadoop community and would like to extend a warm welcome; the community can look forward to seeing much more as we work together in the near future.

Categorized by :
Apache Hadoop Hadoop Ecosystem Industry Happenings

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Integrate with existing systems
Hortonworks maintains and works with an extensive partner ecosystem from broad enterprise platform vendors to specialized solutions and systems integrators.