Mahadev was on the team at Yahoo! in 2006 that started developing what became Apache Hadoop. Since then, he has also held leadership positions in the Apache Zookeeper and Apache Ambari projects. He is an architect and project management committee member for Apache Ambari, Apache ZooKeeper and Apache Hadoop.…
The Hortonworks Blog
What a difference a year makes! Last Fall Ambari was a nascent Apache project that had recently shipped an inaugural release in the community. Fast forward a bit, at the beginning of this year Ambari shipped what has become the foundation for rapid innovation. Now Ambari has become a key member of the Apache Hadoop project ecosystem and a trusted operational platform for many companies.
Let’s take a brief look at the community’s amazing accomplishments over the past year, and then take some time to look forward.…
The Hadoop Distributed File System is the reliable and scalable data core of the Hortonworks Data Platform. In HDP 2.0, YARN + HDFS combine to form the distributed operating system for your Data Platform, providing resource management and scalable data storage to the next generation of analytical applications.
Over the past six months, HDFS has introduced a slew of major features to HDFS covering Enterprise Multi-tenancy, Business Continuity Processing and Enterprise Integration:
- Enabled automated failover with a hot standby and full stack resiliency for the NameNode master service
- Added enterprise standard NFS read/write access to HDFS
- Enabled point in time recovery with Snapshots in HDFS
- Wire Encryption for HDFS Data Transfer Protocol
Looking forward, there are evolving patterns in Data Center infrastructure and Analytical applications that are driving the evolution of HDFS.…
This post is the sixth in our series on the motivations, architecture and performance gains of Apache Tez for data processing in Hadoop. The series has the following posts:
- Apache Tez: A New Chapter in Hadoop Data Processing
- Data Processing API in Apache Tez
- Runtime API in Apache Tez
- Writing a Tez Input/Processor/Output
- Apache Tez: Dynamic Graph Reconfiguration
- Reusing containers in Apache Tez
- Introducing Tez Sessions
Tez follows the traditional Hadoop model of dividing a job into individual tasks, all of which are run as processes via YARN, on the users’ behalf – for isolation, among other reasons.…
Today, with overwhelming partner support, we announced GA of Hortonworks Data Platform 2.0 (HDP 2.0). With 17 certified partners and many more in the works, organizations can confidently get started taking advantage of Hadoop 2.0 its YARN based architecture knowing that the technologies they rely on, run on HDP 2.0.
With a YARN-based architecture that serves as the operating system for Hadoop, HDP 2.0 takes Hadoop beyond single-use, batch processing to a fully functional, multi-use platform that enables batch, interactive, online and stream processing.…
Typical delivery of enterprise software involves a very controlled date with a secret roadmap designed to wow prospects, customers, press and analysts…or at least that is the way it usually works. Open source, however, changes this equation.
As described here, the vision for extending Hadoop beyond its batch-only roots in support of interactive and real-time workloads was set by Arun Murthy back in 2008. The initiation of YARN, the key technology for enabling this vision, started in earnest in 2011, was declared GA by the community in the recent Apache Hadoop 2.2 release, and is now delivered for mainstream enterprises and the broader commercial ecosystem with the release of Hortonworks Data Platform 2.0.…
The last couple of weeks have been a period of intense activity around the Apache projects that comprise the Hadoop ecosystem. While most of the headlines were accorded to Apache Hadoop 2 going GA, it would be remiss not to pay attention to the great progress being made in the Apache projects that complement Hadoop.
We have blogged about these over the course of the past week and the list below provides a quick summary of the phenomenal work contributed in the open by the folks driving these diverse and vital communities.…
Today we are proud to announce the general availability of Apache Pig 0.12!
If you are a Pig user and you’ve been yearning to use additional languages, for more data validation tools, for more expressions, operators and data types, then read on. Version 0.12 includes all of those additions, and now Pig runs on Windows without Cygwin.
This was a great team effort over the past six months with over 30 engineers from Twitter, Yahoo, LinkedIn, Netflix, Microsoft, IBM, Salesforce, Mortardata, Cloudera and several others (including Hortonworks of course).…
Today we are proud to announce the delivery of Apache Ambari 1.4.1. Ambari 1.4.1 combines many months of work in the community advancing the Ambari codebase. Over 760 JIRAs have been resolved since the Ambari 1.2.5 release. We would like to thank the nearly 40 engineers who contributed to help make this release possible.
Hello Hadoop 2, Meet Apache Ambari
The most important addition to Ambari 1.4.1 is support for installing, managing and monitoring a cluster based on the Hadoop 2 stack.…
The Hortonworks HBase team is excited to see HBase 96 released. It represents a broad community effort and massive amount of work that has been building for more than a year.
HBase 96 closes out over 2000 issues (2134 Jira tickets to be exact) and it represented the collective work from a VERY active community. Kudos to everyone involved! As the authors in a recent Apache blog alluded to, the HBase community is very healthy and includes developers from many companies including Hortonworks, Yahoo!, Cloudera, Salesforce, eBay, Intel, and Facebook, just to name just a few.…
This post is authored by Omkar Vinit Joshi with Vinod Kumar Vavilapalli and is the 8th post in the multi-part blog series on Apache Hadoop YARN – a general-purpose, distributed, application management framework that supersedes the classic Apache Hadoop MapReduce framework for processing data in Hadoop clusters. Other posts in this series:
- Introducing Apache Hadoop YARN
- Apache Hadoop YARN – Background and an Overview
- Apache Hadoop YARN – Concepts and Applications
- Apache Hadoop YARN – ResourceManager
- Apache Hadoop YARN – NodeManager
- Running existing applications on Hadoop 2 YARN
- Stabilizing YARN APIs for Apache Hadoop 2
In YARN, applications perform their work by running containers, which today map to processes on the underlying operating system.…
You did it! Last Sunday we challenged you to “Learn Hadoop in 7 days”. We hope that you have risen to the test and kept up with the tutorials we’ve posted each day through Twitter and Facebook. These tutorials should have helped you delve into:
- How to process data with Apache Pig and Apache Hive
- How to use HCatalog, Pig and Hive commands
- How to use Excel 2013 to access and analyze Hadoop data
- And much more…
This post’s Principal Author: Ming Ma, Software Development Manager, eBay. With contribution from Mayank Bansal (eBay), Devaraj Das (Hortonworks), Nicolas Liochon (Scaled Risk), Michael Weng (eBay), Ted Yu (Hortonworks), John Zhao (eBay)
eBay runs Apache Hadoop at extreme scale, with tens of petabytes of data. Hadoop was created for computing challenges like ours, and eBay runs some of the largest Hadoop clusters in existence.
Our business uses Apache HBase to deliver value to our customers in real-time and we are sensitive to any failures because prolonged recovery times significantly degrade site performance and result in material loss of revenue. …
Stinger is not a product. Stinger is a broad community based initiative to bring interactive query at petabyte scale to Hadoop. And today, as representatives of this open, community led effort we are very proud to announce delivery of Apache Hive 0.12, which represents the critical second phase of this project!
Only five months in the making, Apache Hive 0.12 comprises over 420 closed JIRA tickets contributed by ten companies, with nearly 150 thousand lines of code! …
An important tool in the Hadoop developer toolkit is the ability to look at key metrics for a MapReduce job – to understand the performance of each job and to optimize future job runs.
Change from MapReduce v1 and HDP 1.x
In MapReduce-v2 on YARN in HDP 2.0, the JobTracker no longer exists.…