Get Started


Ready to Get Started?

Download sandbox

How can we help you?

closeClose button

The Hortonworks Blog

Debugging distributed systems can be difficult largely because they are designed to run on many (possibly thousands) of hosts in a cluster. This process typically involves monitoring and analyzing log files spread across the cluster, and if the necessary information is not being logged, service restarts and job redeployment may be required. Not only is […]

Register now for the June 8th and 9th event in Novi, MI! The 16th Annual TU-Automotive Conference and Exhibition kicks off next week in Novi, MI, just outside of Detroit. This two-day conference features 150 speakers, 300 booths, and 3000 attendees. It’s the world’s largest conference and exhibition dedicated to automotive technology and innovation. Click […]

At Hadoop Summit San Jose we are excited to be joined by experts in the Healthcare field. Here are just a few of the sessions focussed on Healthcare, but you need to register to attend Hadoop Summit. How Hadoop and a Modern Data Platform Can Enable Transformation in Healthcare Speaker: Beata Puncevic from Blue Cross Blue Shield […]

By 2020, it is estimated that more than 40 ZB of data will be generated annually. This “Big Data” is transforming every single industry. In this blog, I will talk about how Big Data is transforming Public Transportation, especially Rail Transportation. Big Data is transforming both the Plan phase and Operations phase of the Rail […]

There were a lot of great activities and sessions at the recent Apache: Big Data North America in Vancouver, B.C. I enjoyed the technical level of the sessions and meeting others who contribute to projects in the Apache Software Foundation (ASF). The sessions I went to had a high level of interesting technical content, with […]

The world’s top authorities on Apache Hadoop convene at Hadoop Summit San Jose and one of the top questions that will be answered will be around the future and direction of Hadoop. Sanjay Radia – Founder and Architect, Hortonworks lead the track which selected 13 sessions around this topic. I asked Sanjay what he hoped would […]

At Hadoop Summit San Jose the goal of the Data Science, Analytics and Spark track is sure to be packed. Ram Sriharsha – Product Manager Apache Spark, Databricks generalizes the 16 sessions in the track as providing technical guidance around: Leveraging Hadoop for analytics is a key use case across industries and represents a critical value proposition for Hadoop. This track […]

Hadoop Summit San Jose, is just around the corner. I am amazed at the depth and breadth of the technical sessions and was looking at the Application Development track: Application Development YARN has transformed Hadoop into a multi-tenant data platform. It is the foundation for a wide range of processing engines that empowers businesses to […]

In preparation for Hadoop Summit San Jose, I asked the Chair for the Apache Committer Insights track, Andy Feng – VP Architecture, Yahoo! which were the top 3 sessions he would recommend. Although it was a tough choose only 3, he recommended: HDFS: Optimization, Stabilization and Supportability Speakers: Chris Nauroth from Hortonworks and Arpit Agarwal […]

The Ambari Metrics System (AMS) released with Ambari 2.0 about a year ago is an Ambari-native pluggable and scalable system for collecting and querying Hadoop Metrics (AMBARI-5707). Since that time, the community has been working hard at adding new capabilities to the system and recently announced the availability of Ambari 2.2.2 where AMS now includes […]

The first post in this three part series brought to the fore critical strategic trends in the Wealth & Asset Management (WM) space – the most lucrative portion of Banking. This second post will describe an innovation framework for a forward looking WM institution.The final post will cover technology architecture and business strategy recommendations for WM CXO’s. Introduction: […]

Part 1: A Little History In this series of blog posts, we will provide an in-depth look select features introduced with the release of Apache Storm (Storm) 1.0. To kick off the series, we’ll take a look how Storm has evolved over the years from its beginnings as an open source project, up to the […]

Before we drill down into how Hortonworks partnered with Arizona State University (ASU) to design and develop a platform to discover genomic links to cancer, let’s take a look at a few of cancer’s fundamental attributes. Cancer is both a complicated and complex disease.  Cancer is complicated because it is not actually a single disease, but rather the […]

A guest blog post from Scott Schlesinger, Principal, Ernst & Young LLP In July 2015, EY announced its EY Warranty Analytics service offering for the SAP HANA® platform. The service includes EY’s advanced analytics for use with SAP® technology to monitor warranty claims, with the goals of identifying fraudulent activity, reducing costs and improving quality. Automobile […]

Apache Hadoop® exists within a broader ecosystem of enterprise analytical packages. This includes ETL tools, ERP and CRM systems, enterprise data warehouses, data marts and others. Modern workloads flow from these various traditional analytical sources into Hadoop and then often back out again. What dataset came from which system, when and how did it change over […]