cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button

From the Dev Team

SQL is the most popular use case for the Hadoop user community, and Apache Hive is still the defacto standard. Early this week, the Apache Hive community released Apache Hive 1.2.0. Already the third release this year, the Hive developer community continues to improve the release and grow its team, with 11 Hive contributors promoted […]

This is the third post in a series that explores the theme of enabling diverse workloads in YARN.  Our introductory post  to understand the context around all the new features for diverse workloads as part of YARN in HDP 2.2, and a related post on CPU scheduling. Introduction One of the core responsibilities of YARN is monitoring and […]

Kristen Hardwick, Vice President of Big Data Solutions at Spry, Inc is our guest blogger. In this blog, Kristen shares performance analysis during Spryinc’s evaluation of Apache Hive with Tez as a fast query engine. In early 2014, Spry developed a solution that heavily utilized Hive for data transformations. When the project was complete, three […]

With YARN and HDFS at the architectural center, Hadoop has emerged as a key component of any modern data architecture. Today, enterprises utilize Hadoop to store critical datasets and power many of their critical workloads. With this in mind, the services and data within a Hadoop cluster needed to be highly available in face of failures […]

This is the fourth post in a series that explores the theme of enabling diverse workloads in YARN. See the introductory post to understand the context around all the new features for diverse workloads as part of YARN in HDP 2.2. Introduction When it comes to managing resources in YARN, there are two aspects that we, […]

Historically, the strength of a platform lies in the abilities of developers to learn, try, and build against the platform APIs and capabilities. As Apache Hadoop matures as a platform, it’s the creativity and efforts of the developer community that is driving the innovation that makes Hadoop a vibrant and impactful foundation of a modern […]

With Apache Hadoop YARN as its architectural center, Apache Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it in different ways. As YARN propels Hadoop’s emergence as a business-critical data platform, the enterprise requires more stringent […]

Two weeks ago, Apache ORC became an Apache top-level project within the Apache Software Foundation (ASF). This step represents a major step forward for the project, and it is representative of its momentum been built by a broad community of developers. What is ORC and why is it useful? Back in January 2013, we created […]

This is the 3rd post in a series that explores the theme of supporting rolling-upgrades & downgrades of a Hadoop YARN cluster. See the introductory post here. Background and Motivation Before HDP 2.2, Hadoop MapReduce applications depended on MapReduce jars being deployed on all the nodes in a cluster. The java classpath of all the […]

Apache Ambari 2.0 User Views introduce two functional tools to help you understand and optimize your cluster resources to get the best performance in a multitenant Hadoop environment. Tez View: Understand and Optimize Jobs in your Cluster The Tez View gives you visibility into all the jobs on your cluster, allowing you to quickly identify […]

It is that time of the year again! Annual Apache HBase conference, HBaseCon 2015, is around the corner, and as always, it is packed with action and illuminating talks. The conference is this Thursday, May 7th. As in the previous years, there will be 4 tracks covering Operations, Internals, Ecosystem and Use Cases. Here are […]

This is the third post in a series that explores the theme of supporting rolling-upgrades & downgrades of a Hadoop YARN cluster. See here for an introductory post. Introduction Carrying out a rolling upgrade/downgrade of all nodes in a Hadoop cluster can be a very disruptive process. Before HDP 2.2, if a NodeManager (NM) were […]

We at Hortonworks live by a few core principles: Innovate at the core of Hadoop Make Hadoop be an Enterprise Class Data Platform Do it all in open source Enable the ecosystem Our vision of “Hadoop Everywhere” is shared by our partner community who bring their industry expertise, unique software value-add and passion for customer […]

The Apache Hadoop community is happy to announce the release of Apache Hadoop 2.7.0! We want to express our gratitude to every contributor, reviewer and committer. The Hadoop community fixed 923 JIRAs in total as part of the 2.7.0 release. Of the 923 fixes: 259 were in Hadoop Common 350 were in HDFS 253 were […]

Introduction Apache Spark is a fast, in-memory data processing engine with elegant and expressive development APIs in Scala, Java, and Python that allow data workers to efficiently execute machine learning algorithms that require fast iterative access to datasets. Spark on Apache Hadoop YARN enables deep integration with Hadoop and other YARN enabled workloads in the […]