cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button

From the Dev Team

This three part series is co-authored by Ofer Mendelevitch, director of data science at Hortonworks, and Jiwon Seo, Ph.D. and research assistant at Stanford University. Introduction This is the third part of the blog-post series about anomaly detection from healthcare data. In part 1, we described the dataset, the business use-case and our general approach […]

Apache Ambari is the only 100% open source management and provisioning tool for Apache Hadoop. Recent innovations of Apache Ambari have focused on opening Apache Ambari into a pluggable management platform that can automate cluster provisioning, deploy 3rd party software and provide custom operational and developers’ views to the end user. Join us Thursday March […]

We hosted an Apache Slider Meetup at our Hortonworks Santa Clara office on March 4th, where committers, contributors, and community members interested in the Apache Slider congregated to hear what’s happening. There were two presenters. To set the context for the audience, Steve Loughran, member of technical staff at Hortonworks, delivered an extemporaneous high-level overview […]

Introduction Today, organizations use the Apache Hadoop™ stack in the form of a central data lake to store their critical datasets and power their analytical processing workloads. A key requirement for the Hadoop cluster and the services running on it is to be highly available and flawlessly continue to function while software is being upgraded. […]

This three part series is co-authored by Ofer Mendelevitch, director of data science at Hortonworks, and Jiwon Seo, Ph.D. and research assistant at Stanford University. Introduction This is the second part of our blog-post series about anomaly detection from healthcare data. As described in part 1, our goal is to apply the personalized-PageRank algorithm to […]

Apache Hive is the de facto standard for SQL in Hadoop with more enterprises relying on this open source project than any other alternative. Stinger.next, a community based effort, is delivering true enterprise SQL at Hadoop scale and speed. With Hive’s prominence in the enterprise, security within Hive has come under greater focus from enterprise […]

HDP 2.2 brings substantial innovations in Apache Hadoop YARN, enabling users of Apache Hadoop to efficiently store their data in a single repository and interact with it simultaneously using a wide variety of engines. This functionality makes YARN particularly attractive for the integration of many distributed Long-Running services. In this release, we also introduced a […]

This three part series is co-authored by Ofer Mendelevitch, director of data science at Hortonworks, and Jiwon Seo, Ph.D. and research assistant at Stanford University. Introduction PageRank[1]is the poster-child of graph algorithms, used by Google in its original search engine system to determine which web pages are most influential. The incredible success of PageRank led […]

This is the second post in a series exploring the theme of long-running service workloads in YARN. See for the introductory post. Long-running services deployed on YARN are by definition expected to run for a long period of time—in many cases forever. Services such as Apache™ HBase, Apache Accumulo and Apache Storm can be run […]

Analysts and data scientists⎯not to mention business executives⎯want Big Data not for the sake of the data itself, but for the ability to work with and learn from that data. As other users become more savvy, they also want more access. But too many inefficient queries can create a bottleneck in the system. The good […]

This is the third post in a series exploring recent innovations in the Hadoop ecosystem that are included in Hortonworks Data Platform (HDP) 2.2. In this post, we introduce the theme of supporting rolling upgrades and downgrades of a Hadoop YARN cluster. HDP 2.2 offers substantial innovations in Apache™ Hadoop YARN, enabling Hadoop users to […]

As a data scientist working with Hadoop, I often use Apache Hive to explore data, make ad-hoc queries or build data pipelines. Until recently, optimizing Hive queries focused mostly on data layout techniques such as partitioning and bucketing or using custom file formats. In the last couple of years, driven largely by the innovation of […]

The Apache HBase community has released Apache HBase 1.0.0. Seven years in the making, it marks a major milestone in the Apache HBase project’s development, offers some exciting features and new API’s without sacrificing stability, and is both on-wire and on-disk compatible with HBase 0.98.x. In this blog, which is a cross post from from […]

Hortonworks Data Platform’s YARN-based architecture enables multiple applications to share a common cluster and data set while ensuring consistent levels of response made possible by a centralized architecture. Hortonworks led the efforts to on-board open source data processing engines, such as Apache Hive, HBase, Accumulo, Spark, Storm and others, on Apache Hadoop YARN. In this […]

Since our founding in 2011, Hortonworks has had a fundamental belief: the only way to deliver infrastructure platform technology is completely in open source. Moreover, we believe that collaborative open source software development under the governance model of an entity like the Apache Software Foundation (ASF) is the best way to accelerate innovation that targets […]