cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button

From the Dev Team

Last year on December 11th, Hortonworks presented the sixth of 8 Discover HDP 2.2 webinars: Apache HBase with YARN & Slider for Fast NoSQL Access. Justin Sears, Carter Shanklin and Enis Soztutar hosted this 6th webinar in the series. After Justin Sears set the stage for the webinar by explaining the drivers behind Modern Data […]

With YARN as its architectural center, Apache Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it for batch, interactive and real-time streaming use cases. As more data flows into and through a Hadoop cluster to feed […]

We take pride in producing valuable technical blogs and sharing it with a wider audience. Of all the blogs published in 2014 on our website, the following were most popular: Improving Spark for Data Pipelines with Native YARN Integration. Gopal Vijayaraghavan and Oleg Zhurakousky demonstrate improved Apache Spark, which with the help of the pluggable […]

Introduction Apache Ranger provides centralized security for the Enterprise Hadoop ecosystem, including fine-grained access control and centralized audit mechanism, all essential for Enterprise Hadoop. This blog covers various details of Apache Ranger’s audit framework options available with Apache Ranger Release 0.4.0 in HDP 2.2 and how they can be configured. The audit framework can be […]

On December 4th, Hortonworks presented the fifth of 8 Discover HDP 2.2 webinars: Apache Kafka and Apache Storm for Stream Data Processing. Taylor Goetz, Rajiv Onat, and Justin Sears hosted this 5th webinar in the series. After Justin Sears set the stage for the webinar by explaining the drivers behind Modern Data Architecture (MDA), Rajiv […]

With Apache Hadoop YARN as its architectural center, Apache Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it for batch, interactive and real-time streaming use cases. Apache Storm brings real-time data processing capabilities to help capture […]

Hortonworks architects vertically integrate the projects within our Hadoop distribution with YARN and HDFS in order to enable HDP to span workloads from batch, interactive, and real time—across both open source and other data access technologies. In HDP 2.2, we deliver work to vertically integrate Apache Storm, Apache Accumulo and Apache HBase so that all […]

On November 13th, Hortonworks presented the fourth of 8 Discover HDP 2.2 webinars: Rohit Bakhshi, Jitendra Pandey, and Justin Sears hosted this 4th webinar in the series. Rohit Bakhshi and Jitendra Pandey introduced HDP and discussed how to use HDFS for reliable, scalable, cost-efficient, and fault tolerant as a distributed data storage platform for your […]

The Stinger.next initiative, with its focus on transactions, sub-second queries and SQL:2011 Analytics evolves Apache Hive to allow it to run most of the analytical workloads that are typical within a data warehouse, but now at petabyte scale. The first phase of Stinger.Next, delivered in Apache Hive 0.14 and in HDP 2.2, delivers transactions with […]

With Apache Hadoop YARN as its architectural center, Apache Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it in different ways. As YARN propels Hadoop’s emergence as a business-critical data platform, the enterprise requires more stringent […]

The architecture of Hortonworks Data Platform (HDP) matches the blueprint for Enterprise Apache Hadoop, with data management, data access, governance, operations and security. This post focuses on one of those core components: security. Specifically, we will focus on Apache Knox Gateway for securing access to the Hadoop REST APIs. Pseudo Federation Provider This blog will […]

With Apache Hadoop YARN as its architectural center, Apache Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it for batch, interactive and real-time streaming use cases. More and more independent software vendors (ISVs) are developing applications […]

Introduction In this 2nd part of the blog post and its accompanying IPython Notebook in our series on Data Science and Apache Hadoop, we continue to demonstrate how to build a predictive model with Apache Hadoop, using existing modeling tools. And this time we’ll use Apache Spark and ML-Lib. Apache Spark is a relatively new […]

Hadoop Operations for provisioning, managing and monitoring a cluster are critical to the success of a Hadoop project and having an intuitive and effective set of tooling has become a foundational element of a Hadoop distribution. Within HDP, we provide completely open source Apache Ambari to help you be successful with Hadoop operations. The rate […]

Our customers have many choices of infrastructure to deploy HDP: on premise, cloud, virtualized and even as an appliance. Further, our customers have a choice of deploying on Linux and Windows operating systems. You can easily see this creates a complex matrix. At Hortonworks, we believe you should not be limited to just one option […]