Innovation in the Open

Working with the Community, for the Enterprise
These labs offer an open roadmap for the development of Hortonworks Data Platform; outlining what we have delivered and what we are continuing to deliver to lead innovation within the Hadoop ecosystem

YARN : Engineering Hadoop from the core

YARN is the ‘data operating system’ within Hadoop that enables common sets of data to be processed simultaneously by multiple applications.

Introduced as MR-279 by Arun Murthy in 2009, work on Hadoop’s next-generation architecture powered by YARN culminated in 2013 with the release of Hadoop 2.

Today, YARN at the core of Hadoop is the center of our focus on innovation in and around Hadoop. It has become the enabling technology that has started a transition to a modern data architecture within organizations.

More »

Arun Murthy and over 27 Hortonworks engineers are committers to core Hadoop

New applications; Meeting enterprise requirements

YARN provides the foundation for a wide variety of applications to run natively IN Hadoop. Applications including interactive SQL processing (Hive), iterative in-memory analytics (Spark), real-time stream processing (Storm). YARN also provides a consistent approach for applications that solve enterprise requirements in operations, security and data governance.

Speed, Scale and SQL Semantics
The Stinger Initiative is a broad, community-based effort to drive the future of Apache Hive, delivering 100x performance improvements at petabyte scale with familiar SQL semantics.
The performance changes we are making today will transform Hive into a single tool that Hadoop users can use to do report generation, ad hoc queries, and large batch jobs spanning 10s or 100s of terabytes.
Stream Data Processing
Early adopters are using stream processing engines such as Apache Storm to analyze data in real time. Hortonworks has initiated an engineering commitment to deeply integrate STORM with Hadoop.
We are committed to deeply integrate Storm with Hadoop, specifically as a supported component of the 100% Open Source Hortonworks Data Platform.
In-memory Processing with Spark
As Spark emerges as a tool for fast in-memory data processing, Hortonworks have certified Spark as YARN Ready, and we are working with the community to prepare Spark for wider use within the enterprise.
We are committed to deeply integrate Storm with Hadoop, specifically as a supported component of the 100% Open Source Hortonworks Data Platform.
Simplified Data Processing for Hadoop
The goal of the Data Management Initiative is to simplify the creation of data processing solutions for Hadoop. This effort will help enterprises construct solutions that maximize reuse and consistency.
As organizations move more and more data into Hadoop, the requirement to intelligently and automatically categorize and move data has become paramount. Projects like Apache Falcon have been created to meet these needs.
Security for Enterprise Hadoop
A roadmap for flexible, accountable, integrated enterprise security in Hadoop. The roadmap is organized around security best practices for authentication, authorization, accounting and data protection.
Open, Integrated & Intuitive IT Tools
A completely open set of features for provisioning, managing and monitoring Enterprise Hadoop clusters. These will easily integrate with existing IT systems, behind a single pane of glass, providing operational control and deep insight into cluster performance.

Deep engineering partnerships

Working closely with partners, Hortonworks is bringing Hadoop to new platforms and new environments :

Elastic Hadoop on OpenStack
Project Savanna aims to provide operational agility & deployment flexibility across public and private clouds for Hadoop.
Expanding Hadoop with Microsoft
Microsoft and Hortonworks are collaborating in the open to expand the reach of Apache Hadoop and its ecosystem components.
Collaboration for Enterprise Data Apps
The expanded strategic alliance between Hortonworks and Red Hat has two open source leaders collaborating to develop the best in enterprise data solutions.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
HDP 2.1 Webinar Series
Join us for a series of talks on some of the new enterprise functionality available in HDP 2.1 including data governance, security, operations and data access :

Thank you for subscribing!