The Hortonworks Blog

Since our founding in 2011, Hortonworks has had a fundamental belief: the only way to deliver infrastructure platform technology is completely in open source. Moreover, we believe that collaborative open source software development under the governance model of an entity like the Apache Software Foundation (ASF) is the best way to accelerate innovation that targets enterprise end users since it brings the largest number of developers together in a way that enables innovation to happen far faster than any single vendor could achieve and in a way that is free of friction for the enterprise.…

As a core component of the Modern Data Architecture (MDA), organizations rely on the Hortonworks Data Platform (HDP) for their mission critical functions which demand high availability and performance. Key to these organizations is simplified and consistent Hadoop Operations.

Join us for this workshop where we’ll cover the operational concerns of System Administrators & DevOps Engineers including installation, configuration, maintenance, security and performance topics.

Key Highlights include:

  • Hardware Recommendation and Sizing
  • OS tuning guide
  • Rapid and consistent deployment of clusters using Apache Ambari Blueprints
  • Cluster setup validation
  • Multi-tenancy with YARN
  • Security
  • HA and Business Continuity

The workshop will be a combination of slides plus demonstrations of the code in action.…

Talend is a Hortonworks Certified Technology Partner, and our guest blogger today is Shawn James, director, big data business development, Talend. Shawn and Jim Walker, director of product marketing at Hortonworks, are our guest speakers in an upcoming webinar on Feb. 12th.

If you are a data scientist, MapReduce or Hadoop developer, you are in demand given the massive increase in data science-based projects. These projects are being driven by the private sector of course, but also by a public sector that is looking to tackle a new range of use cases using big data.…

In August 2009, the Facebook Data Infrastructure Team published a white paper that outlined a warehousing solution over Hadoop. They called it Hive. And since that time, this project has not only emerged as the defacto standard for SQL in Hadoop, but with the help of the Stinger initiative it has progressed from a batch only framework with limited SQL interface to a near SQL:2011 compliant, fully interactive SQL query engine.…

Informatica users leveraging HDP are now able to see a complete end-to-end visual data lineage map of everything done through the Informatica platform. In this blog post, Scott Hedrick, director Big Data Partnerships at Informatica, tells us more about end-to-end visual data lineage.

Hadoop adoption continues to accelerate within mainstream enterprise IT and, as always, organizations need the ability to govern their end-to-end data pipelines for compliance and visibility purposes. Working with Hortonworks, Informatica has extended the metadata management capabilities in Informatica Big Data Governance Edition to include data lineage visibility of data movement, transformation and cleansing beyond traditional systems to cover Apache Hadoop.…

DataTorrent is a Hortonworks Certified Technology Partner and YARN Ready, offering an enterprise class real-time streaming platform on Hadoop and Hortonworks Data Platform. Thomas Weise, principal architect at DataTorrent, is our guest blogger today.

A while ago, DataTorrent announced a new initiative to integrate Kafka and YARN under the KOYA project. KOYA was proposed as KAFKA-1754 and well received by the community.

Why KOYA?

Kafka is becoming increasingly popular as the data bus to move data in and out of Hadoop clusters.…

Big data and cloud computing are top priorities in enterprise IT today. Organizations are adopting these two disruptive technologies because of the promise of lower cost, flexibility, portability and ease of management.

Today’s blog is another in a series discussing Apache Hadoop in the cloud as a key deployment option. Our guest blogger today is Sean Anderson, Manager of Data Service at Rackspace, the managed cloud company.

In 2012, Rackspace and Hortonworks partnered to expand the capabilities of Enterprise HadoopTM to both public cloud utility services and private clouds utilizing the popular open-source cloud platform Openstack.…

This is the second post in a series that explores recent innovations in the Hadoop ecosystem that are included in HDP 2.2. In this post, we introduce the theme of running service-workloads in YARN to set context for deeper discussion in subsequent blogs.

HDP 2.2 brings substantial innovations in Apache Hadoop YARN, enabling users of Apache Hadoop to efficiently store their data in a single repository and interact with it simultaneously using a wide variety of engines.…

Hortonworks Data Platform (HDP) provides Hadoop for the Enterprise, with a centralized architecture of core enterprise services, for any application and any data. HDP is uniquely built around native YARN services to enable a centralized architecture through which multiple data access applications interact with a shared data set. Apache Hive is one of the most important of those data access applications—the defacto standard for interactive SQL queries over petabytes of data in Hadoop.…

Novetta is an HDP Certified Technology Partner and YARN Certified, delivering agile big data analytics solutions with HDP. In this blog, Jennifer Reed, director of product management at Novetta, shares a recent customer use case in the Oil and Gas vertical market.

Energy companies discover and develop petroleum resources to meet the needs of the global economy. Often, this requires working in areas of the world where organized crime, piracy, and terrorist activity are common and, conversely, where governmental protection is limited.…

This guest blog post is from Alyssa Jarrett, product marketing manager at Splice Machine. Splice Machine is a Hortonworks Certified Technology Partner and provides one of the only Hadoop RDBMS to power a new generation of real-time applications and operational analytics. With its recent Certification with HDP, Splice Machine offers a 10x price/performance improvement over traditional relational databases.

Built on top of the HDFS and Apache HBase components in the Hortonworks Data Platform (HDP), Splice Machine is delighted to announce that it has completed the required integration testing with HDP.…

This is the first post in a series that explores recent innovations in the Hadoop ecosystem that are included in HDP 2.2. In this post, we introduce themes to set context for deeper discussion in subsequent blogs.

HDP 2.2 represents another major step forward for Enterprise Hadoop. With thousands of enhancements across all elements of the platform spanning data access to security to governance, rolling upgrades and more, HDP 2.2 makes it even easier for our customers to incorporate HDP as a core component of Modern Data Architecture (MDA).…

VoltDB is a Certified Hortonworks Technology Partner and developers of an in-memory relational DBMS capable of supporting high volume OLTP and real-time analytics with Hortonworks Data Platform. Our guest blogger today is John Piekos, vice president of engineering at VoltDB.

It’s a common phrase here at VoltDB: Streaming Apps are Really Database Apps When You Use a Database that’s Fast Enough.

What does that mean?

We’re seeing a trend: developers are struggling to create interactive, real-time applications on fast streaming data.…

In our series on Data Science and Hadoop, predicting airline delays, we demonstrated how to build predictive models with Apache Hadoop, using existing tools. In part 1, we employed Pig and Python; part 2 explored Spark, ML-Lib and Scala.

Throughout the series, the thesis, theme, topic, and algorithms were similar. That is, we wanted to dismiss the misconception that data scientists – when applying predictive learning algorithms, like Linear Regression, Random Forest or Neural Networks to large datasets – require dramatic changes to the tooling; that they need dedicated clusters; and that existing tools will not suffice.…

On December 18th, 2014, Hortonworks presented the last of 8 Discover HDP 2.2 webinars: Apache HBase with YARN & Slider for Fast NoSQL Access. Justin Sears, Jeff Sposetti and Mahadev Konar hosted the last webinar in the series.

After Justin Sears set the stage for the webinar by explaining the drivers behind Modern Data Architecture (MDA), Jeff Sposetti and Mahadev Konar introduced Apache Ambari and discussed Ambari innovations now included in HDP 2.2:

  • Configuration Enhancements, including Versioning & History
  • Ambari Administration, including Views Framework
  • Ambari Stacks “Stacks Advisor”

Here is the complete recording of the Webinar

Here are the presentation slides.