cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button

From the Dev Team

Enterprises across all major industries adopt Apache Hadoop for its ability to store and process an abundance of new types of data in a modern data architecture. This “Any Data” capability has always been a hallmark feature of Hadoop, opening insight from new data sources such as clickstream, web and social, geo-location, IoT, server logs, […]

Hortonworks is pleased to announce the general availability of Apache Spark in Hortonworks Data Platform (HDP)— now available on our downloads page. With HDP 2.2.4 Hortonworks now offers support for your developers and data scientists using Apache Spark 1.2.1. HDP’s YARN-based architecture enables multiple applications to share a common cluster and dataset while ensuring consistent […]

Hortonworks Data Platform (HDP) provides centralized enterprise services for comprehensive security to enable end-to-end protection, access, compliance and auditing of data in motion and at rest. HDP’s centralized architecture—with Apache Hadoop YARN at its core—also enables consistent operations to enable provisioning, management, monitoring and deployment of Hadoop clusters for a reliable enterprise-ready data lake. But […]

The recent post by Jayush Luniya announced the community release of Apache Ambari 2.0. One of the three key Ambari features that Jayush discussed was Rolling Upgrades, enabling Hadoop operators to upgrade from one version of HDP to the next, with minimal disruption to the cluster. The Hortonworks development team worked long and hard to […]

This is the third post in a series exploring recent innovations in the Hadoop ecosystem that are included in Hortonworks Data Platform (HDP) 2.2. In this post, we introduce the theme of supporting rolling upgrades and downgrades of a HDFS cluster. See this previous post for an introduction on enterprise-grade rolling upgrades in HDP 2.2. […]

Advances in Hadoop security, governance and operations have accelerated adoption of the platform by enterprises everywhere. Apache Ambari is the open source operational platform for provisioning, managing and monitoring Hadoop clusters from a single pane of glass, and with the Apache Ambari 1.7.0 release last year, Ambari made it far easier for enterprises to adopt […]

Hortonworks is excited to announce that our first hands-on, performance based certification exam is now available! The HDP Certified Developer (HDPCD) exam is designed for Hadoop developers working with frameworks like Pig, Hive, Sqoop and Flume. This new approach to Hadoop certification is designed to allow individuals an opportunity to prove their Hadoop skills in […]

This three part series is co-authored by Ofer Mendelevitch, director of data science at Hortonworks, and Jiwon Seo, Ph.D. and research assistant at Stanford University. Introduction This is the third part of the blog-post series about anomaly detection from healthcare data. In part 1, we described the dataset, the business use-case and our general approach […]

Apache Ambari is the only 100% open source management and provisioning tool for Apache Hadoop. Recent innovations of Apache Ambari have focused on opening Apache Ambari into a pluggable management platform that can automate cluster provisioning, deploy 3rd party software and provide custom operational and developers’ views to the end user. Join us Thursday March […]

We hosted an Apache Slider Meetup at our Hortonworks Santa Clara office on March 4th, where committers, contributors, and community members interested in the Apache Slider congregated to hear what’s happening. There were two presenters. To set the context for the audience, Steve Loughran, member of technical staff at Hortonworks, delivered an extemporaneous high-level overview […]

Introduction Today, organizations use the Apache Hadoop™ stack in the form of a central data lake to store their critical datasets and power their analytical processing workloads. A key requirement for the Hadoop cluster and the services running on it is to be highly available and flawlessly continue to function while software is being upgraded. […]

This three part series is co-authored by Ofer Mendelevitch, director of data science at Hortonworks, and Jiwon Seo, Ph.D. and research assistant at Stanford University. Introduction This is the second part of our blog-post series about anomaly detection from healthcare data. As described in part 1, our goal is to apply the personalized-PageRank algorithm to […]

Apache Hive is the de facto standard for SQL in Hadoop with more enterprises relying on this open source project than any other alternative. Stinger.next, a community based effort, is delivering true enterprise SQL at Hadoop scale and speed. With Hive’s prominence in the enterprise, security within Hive has come under greater focus from enterprise […]

HDP 2.2 brings substantial innovations in Apache Hadoop YARN, enabling users of Apache Hadoop to efficiently store their data in a single repository and interact with it simultaneously using a wide variety of engines. This functionality makes YARN particularly attractive for the integration of many distributed Long-Running services. In this release, we also introduced a […]

This three part series is co-authored by Ofer Mendelevitch, director of data science at Hortonworks, and Jiwon Seo, Ph.D. and research assistant at Stanford University. Introduction PageRank[1]is the poster-child of graph algorithms, used by Google in its original search engine system to determine which web pages are most influential. The incredible success of PageRank led […]