The Hortonworks Blog

Posts categorized by : Innovation from Hortonwoks

This is the third post in a series exploring recent innovations in the Hadoop ecosystem that are included in Hortonworks Data Platform (HDP) 2.2. In this post, we introduce the theme of supporting rolling upgrades and downgrades of a Hadoop YARN cluster.

HDP 2.2 offers substantial innovations in Apache™ Hadoop YARN, enabling Hadoop users to efficiently store and interact with their data in a single repository, simultaneously using a wide variety of engines.…

Hortonworks provides enterprise Hadoop for the telecommunications service provider, and Hortonworks Data Platform (HDP) is architected from the ground up with the centralized YARN-based architecture and core enterprise services for data governance, security and cluster operations that can revolutionize your telecommunications business.

As the originators of Hadoop, leaders in the developer community, and partners for your success, nobody is better to help you become a data-centric telecommunications enterprise.

Hortonworks supports most of the largest North American carriers.…

As a data scientist working with Hadoop, I often use Apache Hive to explore data, make ad-hoc queries or build data pipelines.

Until recently, optimizing Hive queries focused mostly on data layout techniques such as partitioning and bucketing or using custom file formats.

In the last couple of years, driven largely by the innovation of the Hive community around the Stinger initiative, Hive query time has improved dramatically, enabling Hive to support both batch and interactive workloads at speed and at scale.…

The Apache HBase community has released Apache HBase 1.0.0. Seven years in the making, it marks a major milestone in the Apache HBase project’s development, offers some exciting features and new API’s without sacrificing stability, and is both on-wire and on-disk compatible with HBase 0.98.x.

In this blog, which is a cross post from from Apache HBase Blog, we look at the past, present and future of Apache HBase project.…

This is a unique moment in time. Fueled by open source, Apache Hadoop has become an essential part of the modern enterprise data architecture and the Hadoop market is accelerating at an amazing rate.

The impressive thing about successful open source projects is the pace of the “release early, release often” development cycle, also known as upstream innovation. The process moves through major and minor releases at a regular clip and the downstream users get to pick the releases and versions they want to consume for their specific needs.…

Today Microsoft announced two important new updates to their Azure HDInsight Service with Apache Hadoop 2.6, now available on new clusters.

We are excited to continue to work alongside Microsoft in expanding the deployment options to the Linux Operating System for managed Hadoop as a Service Azure HDInsight clusters. The HDInsight on Linux Preview leverages the completely open Apache Ambari framework to deploy, manage and monitor Hadoop clusters on premise or in the cloud.…

Today, SAS and Hortonworks, two long-time partners and innovators in the Big Data and Analytics space, have announced the certification and release of SAS® Data Loader for Hadoop.

Read the guest blog post below and learn more about SAS and Hortonworks’ joint efforts, thanks to Keith Renison, Senior Solutions Architect for SAS Global Technology Practice.

The New Analytics Culture

Let’s talk about three key elements that drive data management for Hadoop.…

Hortonworks Data Platform’s YARN-based architecture enables multiple applications to share a common cluster and data set while ensuring consistent levels of response made possible by a centralized architecture. Hortonworks led the efforts to on-board open source data processing engines, such as Apache Hive, HBase, Accumulo, Spark, Storm and others, on Apache Hadoop YARN.

In this blog, we will focus on one of those data processing engines—Apache Storm—and its relationship with Apache Kafka.…

Talend is a Hortonworks Certified Technology Partner, and our guest blogger today is Shawn James, director, big data business development, Talend. Shawn and Jim Walker, director of product marketing at Hortonworks, are our guest speakers in an upcoming webinar on Feb. 12th.

If you are a data scientist, MapReduce or Hadoop developer, you are in demand given the massive increase in data science-based projects. These projects are being driven by the private sector of course, but also by a public sector that is looking to tackle a new range of use cases using big data.…

In August 2009, the Facebook Data Infrastructure Team published a white paper that outlined a warehousing solution over Hadoop. They called it Hive. And since that time, this project has not only emerged as the defacto standard for SQL in Hadoop, but with the help of the Stinger initiative it has progressed from a batch only framework with limited SQL interface to a near SQL:2011 compliant, fully interactive SQL query engine.…

Informatica users leveraging HDP are now able to see a complete end-to-end visual data lineage map of everything done through the Informatica platform. In this blog post, Scott Hedrick, director Big Data Partnerships at Informatica, tells us more about end-to-end visual data lineage.

Hadoop adoption continues to accelerate within mainstream enterprise IT and, as always, organizations need the ability to govern their end-to-end data pipelines for compliance and visibility purposes. Working with Hortonworks, Informatica has extended the metadata management capabilities in Informatica Big Data Governance Edition to include data lineage visibility of data movement, transformation and cleansing beyond traditional systems to cover Apache Hadoop.…

DataTorrent is a Hortonworks Certified Technology Partner and YARN Ready, offering an enterprise class real-time streaming platform on Hadoop and Hortonworks Data Platform. Thomas Weise, principal architect at DataTorrent, is our guest blogger today.

A while ago, DataTorrent announced a new initiative to integrate Kafka and YARN under the KOYA project. KOYA was proposed as KAFKA-1754 and well received by the community.

Why KOYA?

Kafka is becoming increasingly popular as the data bus to move data in and out of Hadoop clusters.…

This is the second post in a series that explores recent innovations in the Hadoop ecosystem that are included in HDP 2.2. In this post, we introduce the theme of running service-workloads in YARN to set context for deeper discussion in subsequent blogs.

HDP 2.2 brings substantial innovations in Apache Hadoop YARN, enabling users of Apache Hadoop to efficiently store their data in a single repository and interact with it simultaneously using a wide variety of engines.…

Hortonworks Data Platform (HDP) provides Hadoop for the Enterprise, with a centralized architecture of core enterprise services, for any application and any data. HDP is uniquely built around native YARN services to enable a centralized architecture through which multiple data access applications interact with a shared data set. Apache Hive is one of the most important of those data access applications—the defacto standard for interactive SQL queries over petabytes of data in Hadoop.…

VoltDB is a Certified Hortonworks Technology Partner and developers of an in-memory relational DBMS capable of supporting high volume OLTP and real-time analytics with Hortonworks Data Platform. Our guest blogger today is John Piekos, vice president of engineering at VoltDB.

It’s a common phrase here at VoltDB: Streaming Apps are Really Database Apps When You Use a Database that’s Fast Enough.

What does that mean?

We’re seeing a trend: developers are struggling to create interactive, real-time applications on fast streaming data.…