cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button

From the Dev Team

The recent post by Jayush Luniya announced the community release of Apache Ambari 2.0. One of the three key Ambari features that Jayush discussed was Rolling Upgrades, enabling Hadoop operators to upgrade from one version of HDP to the next, with minimal disruption to the cluster. The Hortonworks development team worked long and hard to […]

This is the third post in a series exploring recent innovations in the Hadoop ecosystem that are included in Hortonworks Data Platform (HDP) 2.2. In this post, we introduce the theme of supporting rolling upgrades and downgrades of a HDFS cluster. See this previous post for an introduction on enterprise-grade rolling upgrades in HDP 2.2. […]

Advances in Hadoop security, governance and operations have accelerated adoption of the platform by enterprises everywhere. Apache Ambari is the open source operational platform for provisioning, managing and monitoring Hadoop clusters from a single pane of glass, and with the Apache Ambari 1.7.0 release last year, Ambari made it far easier for enterprises to adopt […]

Hortonworks is excited to announce that our first hands-on, performance based certification exam is now available! The HDP Certified Developer (HDPCD) exam is designed for Hadoop developers working with frameworks like Pig, Hive, Sqoop and Flume. This new approach to Hadoop certification is designed to allow individuals an opportunity to prove their Hadoop skills in […]

This three part series is co-authored by Ofer Mendelevitch, director of data science at Hortonworks, and Jiwon Seo, Ph.D. and research assistant at Stanford University. Introduction This is the third part of the blog-post series about anomaly detection from healthcare data. In part 1, we described the dataset, the business use-case and our general approach […]

Apache Ambari is the only 100% open source management and provisioning tool for Apache Hadoop. Recent innovations of Apache Ambari have focused on opening Apache Ambari into a pluggable management platform that can automate cluster provisioning, deploy 3rd party software and provide custom operational and developers’ views to the end user. Join us Thursday March […]

We hosted an Apache Slider Meetup at our Hortonworks Santa Clara office on March 4th, where committers, contributors, and community members interested in the Apache Slider congregated to hear what’s happening. There were two presenters. To set the context for the audience, Steve Loughran, member of technical staff at Hortonworks, delivered an extemporaneous high-level overview […]

Introduction Today, organizations use the Apache Hadoop™ stack in the form of a central data lake to store their critical datasets and power their analytical processing workloads. A key requirement for the Hadoop cluster and the services running on it is to be highly available and flawlessly continue to function while software is being upgraded. […]

This three part series is co-authored by Ofer Mendelevitch, director of data science at Hortonworks, and Jiwon Seo, Ph.D. and research assistant at Stanford University. Introduction This is the second part of our blog-post series about anomaly detection from healthcare data. As described in part 1, our goal is to apply the personalized-PageRank algorithm to […]

Apache Hive is the de facto standard for SQL in Hadoop with more enterprises relying on this open source project than any other alternative. Stinger.next, a community based effort, is delivering true enterprise SQL at Hadoop scale and speed. With Hive’s prominence in the enterprise, security within Hive has come under greater focus from enterprise […]

HDP 2.2 brings substantial innovations in Apache Hadoop YARN, enabling users of Apache Hadoop to efficiently store their data in a single repository and interact with it simultaneously using a wide variety of engines. This functionality makes YARN particularly attractive for the integration of many distributed Long-Running services. In this release, we also introduced a […]

This three part series is co-authored by Ofer Mendelevitch, director of data science at Hortonworks, and Jiwon Seo, Ph.D. and research assistant at Stanford University. Introduction PageRank[1]is the poster-child of graph algorithms, used by Google in its original search engine system to determine which web pages are most influential. The incredible success of PageRank led […]

This is the second post in a series exploring the theme of long-running service workloads in YARN. See for the introductory post. Long-running services deployed on YARN are by definition expected to run for a long period of time—in many cases forever. Services such as Apache™ HBase, Apache Accumulo and Apache Storm can be run […]

Analysts and data scientists⎯not to mention business executives⎯want Big Data not for the sake of the data itself, but for the ability to work with and learn from that data. As other users become more savvy, they also want more access. But too many inefficient queries can create a bottleneck in the system. The good […]

This is the third post in a series exploring recent innovations in the Hadoop ecosystem that are included in Hortonworks Data Platform (HDP) 2.2. In this post, we introduce the theme of supporting rolling upgrades and downgrades of a Hadoop YARN cluster. HDP 2.2 offers substantial innovations in Apache™ Hadoop YARN, enabling Hadoop users to […]