cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button

From the Dev Team

Since our founding in 2011, Hortonworks has had a fundamental belief: the only way to deliver infrastructure platform technology is completely in open source. Moreover, we believe that collaborative open source software development under the governance model of an entity like the Apache Software Foundation (ASF) is the best way to accelerate innovation that targets […]

As a core component of the Modern Data Architecture (MDA), organizations rely on the Hortonworks Data Platform (HDP) for their mission critical functions which demand high availability and performance. Key to these organizations is simplified and consistent Hadoop Operations. Join us for this workshop where we’ll cover the operational concerns of System Administrators & DevOps […]

In August 2009, the Facebook Data Infrastructure Team published a white paper that outlined a warehousing solution over Hadoop. They called it Hive. And since that time, this project has not only emerged as the defacto standard for SQL in Hadoop, but with the help of the Stinger initiative it has progressed from a batch […]

Big data and cloud computing are top priorities in enterprise IT today. Organizations are adopting these two disruptive technologies because of the promise of lower cost, flexibility, portability and ease of management. Today’s blog is another in a series discussing Apache Hadoop in the cloud as a key deployment option. Our guest blogger today is […]

This is the second post in a series that explores recent innovations in the Hadoop ecosystem that are included in HDP 2.2. In this post, we introduce the theme of running service-workloads in YARN to set context for deeper discussion in subsequent blogs. HDP 2.2 brings substantial innovations in Apache Hadoop YARN, enabling users of […]

Hortonworks Data Platform (HDP) provides Hadoop for the Enterprise, with a centralized architecture of core enterprise services, for any application and any data. HDP is uniquely built around native YARN services to enable a centralized architecture through which multiple data access applications interact with a shared data set. Apache Hive is one of the most […]

This guest blog post is from Alyssa Jarrett, product marketing manager at Splice Machine. Splice Machine is a Hortonworks Certified Technology Partner and provides one of the only Hadoop RDBMS to power a new generation of real-time applications and operational analytics. With its recent Certification with HDP, Splice Machine offers a 10x price/performance improvement over […]

This is the first post in a series that explores recent innovations in the Hadoop ecosystem that are included in HDP 2.2. In this post, we introduce themes to set context for deeper discussion in subsequent blogs. HDP 2.2 represents another major step forward for Enterprise Hadoop. With thousands of enhancements across all elements of […]

In our series on Data Science and Hadoop, predicting airline delays, we demonstrated how to build predictive models with Apache Hadoop, using existing tools. In part 1, we employed Pig and Python; part 2 explored Spark, ML-Lib and Scala. Throughout the series, the thesis, theme, topic, and algorithms were similar. That is, we wanted to […]

On December 18th, 2014, Hortonworks presented the last of 8 Discover HDP 2.2 webinars: Apache HBase with YARN & Slider for Fast NoSQL Access. Justin Sears, Jeff Sposetti and Mahadev Konar hosted the last webinar in the series. After Justin Sears set the stage for the webinar by explaining the drivers behind Modern Data Architecture […]

Apache HBase is the online database natively integrated with Hadoop, making HBase the obvious choice for applications that rely on Hadoop’s scale and flexible data processing. With the Hortonworks Data Platform 2.2, HBase High Availability has taken a major step forward, allowing apps on HBase to deliver 99.99% uptime guarantees. This blog takes a look […]

The Hadoop Distributed File System (HDFS) is the reliable and scalable data storage core of the Hortonworks Data Platform (HDP). In HDP, HDFS and YARN combine to form the distributed operating system for your data platform, providing resource management for diverse workloads and scalable data storage for the next generation of analytical applications. In this […]

Last year on December 11th, Hortonworks presented the sixth of 8 Discover HDP 2.2 webinars: Apache HBase with YARN & Slider for Fast NoSQL Access. Justin Sears, Carter Shanklin and Enis Soztutar hosted this 6th webinar in the series. After Justin Sears set the stage for the webinar by explaining the drivers behind Modern Data […]

With YARN as its architectural center, Apache Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it for batch, interactive and real-time streaming use cases. As more data flows into and through a Hadoop cluster to feed […]

We take pride in producing valuable technical blogs and sharing it with a wider audience. Of all the blogs published in 2014 on our website, the following were most popular: Improving Spark for Data Pipelines with Native YARN Integration. Gopal Vijayaraghavan and Oleg Zhurakousky demonstrate improved Apache Spark, which with the help of the pluggable […]