cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button

From the Dev Team

This is the first post in a series that explores recent innovations in the Hadoop ecosystem that are included in HDP 2.2. In this post, we introduce themes to set context for deeper discussion in subsequent blogs. HDP 2.2 represents another major step forward for Enterprise Hadoop. With thousands of enhancements across all elements of […]

In our series on Data Science and Hadoop, predicting airline delays, we demonstrated how to build predictive models with Apache Hadoop, using existing tools. In part 1, we employed Pig and Python; part 2 explored Spark, ML-Lib and Scala. Throughout the series, the thesis, theme, topic, and algorithms were similar. That is, we wanted to […]

On December 18th, 2014, Hortonworks presented the last of 8 Discover HDP 2.2 webinars: Apache HBase with YARN & Slider for Fast NoSQL Access. Justin Sears, Jeff Sposetti and Mahadev Konar hosted the last webinar in the series. After Justin Sears set the stage for the webinar by explaining the drivers behind Modern Data Architecture […]

Apache HBase is the online database natively integrated with Hadoop, making HBase the obvious choice for applications that rely on Hadoop’s scale and flexible data processing. With the Hortonworks Data Platform 2.2, HBase High Availability has taken a major step forward, allowing apps on HBase to deliver 99.99% uptime guarantees. This blog takes a look […]

The Hadoop Distributed File System (HDFS) is the reliable and scalable data storage core of the Hortonworks Data Platform (HDP). In HDP, HDFS and YARN combine to form the distributed operating system for your data platform, providing resource management for diverse workloads and scalable data storage for the next generation of analytical applications. In this […]

Last year on December 11th, Hortonworks presented the sixth of 8 Discover HDP 2.2 webinars: Apache HBase with YARN & Slider for Fast NoSQL Access. Justin Sears, Carter Shanklin and Enis Soztutar hosted this 6th webinar in the series. After Justin Sears set the stage for the webinar by explaining the drivers behind Modern Data […]

With YARN as its architectural center, Apache Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it for batch, interactive and real-time streaming use cases. As more data flows into and through a Hadoop cluster to feed […]

We take pride in producing valuable technical blogs and sharing it with a wider audience. Of all the blogs published in 2014 on our website, the following were most popular: Improving Spark for Data Pipelines with Native YARN Integration. Gopal Vijayaraghavan and Oleg Zhurakousky demonstrate improved Apache Spark, which with the help of the pluggable […]

Introduction Apache Ranger provides centralized security for the Enterprise Hadoop ecosystem, including fine-grained access control and centralized audit mechanism, all essential for Enterprise Hadoop. This blog covers various details of Apache Ranger’s audit framework options available with Apache Ranger Release 0.4.0 in HDP 2.2 and how they can be configured. The audit framework can be […]

On December 4th, Hortonworks presented the fifth of 8 Discover HDP 2.2 webinars: Apache Kafka and Apache Storm for Stream Data Processing. Taylor Goetz, Rajiv Onat, and Justin Sears hosted this 5th webinar in the series. After Justin Sears set the stage for the webinar by explaining the drivers behind Modern Data Architecture (MDA), Rajiv […]

With Apache Hadoop YARN as its architectural center, Apache Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it for batch, interactive and real-time streaming use cases. Apache Storm brings real-time data processing capabilities to help capture […]

Hortonworks architects vertically integrate the projects within our Hadoop distribution with YARN and HDFS in order to enable HDP to span workloads from batch, interactive, and real time—across both open source and other data access technologies. In HDP 2.2, we deliver work to vertically integrate Apache Storm, Apache Accumulo and Apache HBase so that all […]

On November 13th, Hortonworks presented the fourth of 8 Discover HDP 2.2 webinars: Rohit Bakhshi, Jitendra Pandey, and Justin Sears hosted this 4th webinar in the series. Rohit Bakhshi and Jitendra Pandey introduced HDP and discussed how to use HDFS for reliable, scalable, cost-efficient, and fault tolerant as a distributed data storage platform for your […]

The Stinger.next initiative, with its focus on transactions, sub-second queries and SQL:2011 Analytics evolves Apache Hive to allow it to run most of the analytical workloads that are typical within a data warehouse, but now at petabyte scale. The first phase of Stinger.Next, delivered in Apache Hive 0.14 and in HDP 2.2, delivers transactions with […]

With Apache Hadoop YARN as its architectural center, Apache Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it in different ways. As YARN propels Hadoop’s emergence as a business-critical data platform, the enterprise requires more stringent […]