The Hortonworks Blog

Two weeks ago, Apache ORC became an Apache top-level project within the Apache Software Foundation (ASF). This step represents a major step forward for the project, and it is representative of its momentum been built by a broad community of developers.

What is ORC and why is it useful?

Back in January 2013, we created ORC files as part of the Stinger initiative to massively speed up Apache Hive and improve the storage efficiency of data stored in Apache Hadoop.…

The connected and collected vehicle data, emitted through embedded smart sensors, are transforming the automotive industry. Is this hype or reality?

To discuss the reality of this transformation, to tackle management of streams of data from connected cars, and to share new data architectures that process, manage and analyze volumes of data, automakers and key industry innovators will gather in Berlin for Telematics Berlin 2015 on May 11-12th.

Data Deluge

Because legacy architectures have limited capacity to store streams of unstructured and varied data at petabyte scale, lack the ability to analyze data in real-time and offer value and insights, automakers are looking to next generation data platforms.…

Hortonworks subscribers across all major industries use Hortonworks Data Platform (HDP) to power advanced analytics applications for data discovery and predictive analytics. The insurance industry uses Hadoop to better leverage unstructured information to strengthen subrogation opportunities, stop fraud and minimize claims leakage. This requires new capabilities for data discovery.

Cindy Maike is the GM for Insurance Solutions at Hortonworks, and next week she will be a panelist at the inaugural Analytics for Insurance Canada event on the usage of analytics in claims at the Analytics for Insurance Canada 2015.…

Am 22. Mai 2015 veranstalten Hortonworks und die codecentric AG einen kostenfreien Community Day im Rahmen ihrer langjährigen Partnerschaft. Auf der Agenda stehen Erfahrungsberichte aus dem Unternehmensalltag und neueste Entwicklungen vom europäischen Hadoop Summit 2015 – darunter Spark-on-YARN, Apache Zeppelin und die Hadoop-Enterprise-Features zu Security und Data Governance.

Spannende Big-Data-Projekte von „Atlas“ bis „Zeppelin“


Für einen spannenden Einstieg werden Florian Herrmann und Daniel Schmitt von der Fiducia IT AG sorgen: Sie demonstrieren, wie die Volks- und Raiffeisenbanken eine Lambda-Architektur zur Erkennung von Betrugsversuchen umgesetzt haben.…

This is the 3rd post in a series that explores the theme of supporting rolling-upgrades & downgrades of a Hadoop YARN cluster. See the introductory post here.

Background and Motivation

Before HDP 2.2, Hadoop MapReduce applications depended on MapReduce jars being deployed on all the nodes in a cluster. The java classpath of all the tasks and the ApplicationMaster of a MapReduce job were set to point to the deployed jars.…

Apache Ambari 2.0 User Views introduce two functional tools to help you understand and optimize your cluster resources to get the best performance in a multitenant Hadoop environment.

Tez View: Understand and Optimize Jobs in your Cluster

The Tez View gives you visibility into all the jobs on your cluster, allowing you to quickly identify which jobs consume the most resources and which are the best candidates to optimize.

With the Tez View you can quickly spot Hive or Pig jobs that are taking the longest, writing the most data or consuming the most CPU.…

Argyle Data is a Hortonworks Technology Partner and recently certified on the Hortonworks Data Platform (HDP), and was awarded the OPS Ready badge for their integration with Apache Ambari. Here, Dr. Ian Howells talks about how Argyle Data is helping customers detect fraud faster with their native Hadoop application.

We believe that the world is moving to a new generation of native Apache Hadoop applications. When you build your application from the ground up on Hadoop, it is critical to make it simple for any organization to provision, manage and monitor at scale.…

It is that time of the year again!

Annual Apache HBase conference, HBaseCon 2015, is around the corner, and as always, it is packed with action and illuminating talks.

The conference is this Thursday, May 7th. As in the previous years, there will be 4 tracks covering Operations, Internals, Ecosystem and Use Cases.

Here are a few sessions that I am personally excited about:

This year, SQL solutions are well represented.…

This week we are participating in the Microsoft Ignite conference in Chicago. Microsoft Ignite focuses on all Microsoft technologies and professionals and we are excited to demonstrate all of the ways we’ve been working with Microsoft to Do Hadoop together. As a long time Microsoft partner we are glad to be participating in this event for the 3rd year in a row showing of a history of joint engineering and commitment to the Microsoft platforms and users.…

This is the third post in a series that explores the theme of supporting rolling-upgrades & downgrades of a Hadoop YARN cluster. See here for an introductory post.


Carrying out a rolling upgrade/downgrade of all nodes in a Hadoop cluster can be a very disruptive process. Before HDP 2.2, if a NodeManager (NM) were brought down, all active containers on that node would be killed. This would significantly interrupt all applications in the cluster being upgraded/downgraded.…

It’s going to be a big week at EMC World! We’ll be exhibiting at the event and there are a number of opportunities to meet with us and hear about the partnership between EMC and Hortonworks. We look forward to seeing you there!


Hortonworks will be in booth #132, right next to the EMC Open@EMC booth. We’d love to meet with you to discuss how EMC Isilon and the Hortonworks Data Platform deliver a Modern Data Architecture.…

We at Hortonworks live by a few core principles:

  • Innovate at the core of Hadoop
  • Make Hadoop be an Enterprise Class Data Platform
  • Do it all in open source
  • Enable the ecosystem

Our vision of “Hadoop Everywhere” is shared by our partner community who bring their industry expertise, unique software value-add and passion for customer success to enable transformational change across our joint customers. We as a Hadoop community are succeeding everyday in transforming enterprises into a data-first organization.…

Having just returned from our Hadoop Summit Europe event, I was struck by the number of sessions that involved large scale businesses outlining the impact of their advanced analytic applications (built on Hadoop) and how those analytics are empowering better business decisions.

The story of business value is significant. Session after session, representatives from various industries talked about how their modern data architectures with Hadoop led to increased agility, new innovative customer experiences, and lower cost structures.…

On April 30, learn from experts at Hortonworks, Cisco, and Red Hat about accelerating the implementation of a scalable, cost-efficient and robust Big Data solution. Here is a sneak preview of what you’ll hear from our speakers:

  • Ali Bajawa, Senior Partner Solution Engineer, Hortonworks
  • Ron Graham, System Engineer for Big Data Analytics, Cisco
  • Irshad Raihan, Senior Principal, Big Data Product Marketing, Red Hat

Register Now

1. What should a company consider when looking for a big data solution?…

The Apache Hadoop community is happy to announce the release of Apache Hadoop 2.7.0! We want to express our gratitude to every contributor, reviewer and committer.

The Hadoop community fixed 923 JIRAs in total as part of the 2.7.0 release. Of the 923 fixes:

  • 259 were in Hadoop Common
  • 350 were in HDFS
  • 253 were in YARN
  • 61 were in MapReduce

Hadoop 2.7.0 is the first Hadoop release in 2015, following late last year’s 2.6.0.…