The Hortonworks Blog

Posts categorized by : Stream Data

We are excited to announce that Apache Kafka 0.8.1.1 is now available as a technical preview with Hortonworks Data Platform 2.1. Kafka was originally developed at LinkedIn and incubated as an Apache project in 2011. It graduated to a top-level Apache project in October of 2012.

Many organizations already use Kafka for their data pipelines, including Hortonworks customers like Spotify and Tagged.

What is Apache Kafka?

Apache Kafka is a fast, scalable, durable, and fault-tolerant publish-subscribe messaging system.…

YARN and Apache Storm: A Powerful Combination

YARN changed the game for all data access engines in Apache Hadoop. As part of Hadoop 2, YARN took the resource management capabilities that were in MapReduce and packaged them for use by new engines. Now Apache Storm is one of those data-processing engines that can run alongside many others, coordinated by YARN.

YARN’s architecture makes it much easier for users to build and run multiple applications in Hadoop, all sharing a common resource manager.…

Apache Cassandra is an open source NoSQL distributed database management system designed to handle large amounts of data offering a scalable real time solution that allows users to create online applications that are “always-on, no matter what.” DataStax is the company behind Cassandra, and a new Technology Partner of Hortonworks.

Lynn Walitch leads Partner Management for DataStax and is our guest blogger today. Lynn discusses the importance of the partnership and certification with Hortonworks.…

The Apache Storm community recently announced the release of Apache Storm 0.9.2, which includes improvements to Storm’s user interface and an overhaul of its netty-based transport.

We thank all who have contributed to Storm – whether through direct code contributions, documentation, bug reports, or helping other users on the mailing lists. Together, we resolved 112 JIRA issues.

Here are summaries of this version’s important fixes and improvements.

New Feature Highlights Netty Transport Overhaul

Storm’s Netty-based transport has been overhauled to significantly improve performance through better utilization of thread, CPU, and network resources, particularly in cases where message sizes are small.…

Three weeks ago, we announced availability of the technical preview of Hortonworks Data Platform (HDP) version 2.1 and since then we have had thousands of downloads of this preview.  We also promised delivery of GA bits on April 22nd  and we are delighted to deliver as stated. HDP 2.1, which includes countless new features across seven new components, is available today from our download page

YARN unlocks the Data Lake

YARN, the resource management layer of Hadoop 2 is delivering value as it has unlocked the data lake vision for many.…

The pace of innovation within the Apache Hadoop community is truly remarkable, enabling us to announce the availability of Hortonworks Data Platform 2.1, incorporating the very latest innovations from the Hadoop community in an integrated, tested, and completely open enterprise data platform.

Download HDP 2.1 Technical Preview Now

What’s In Hortonworks Data Platform 2.1? Presentation & Applications Enable both existing and new applications to provide value to the organization. Enterprise Management & Security Empower existing operations and security tools to manage Hadoop.…

In February 2014, the Apache Storm community released Storm version 0.9.1. Storm is a distributed, fault-tolerant, and high-performance real-time computation system that provides strong guarantees on the processing of data. Hortonworks is already supporting customers using this important project today.

Many organizations have already used Storm, including our partner Yahoo! This version of Apache Storm (version 0.9.1) is:

  • Highly scalable. Like Hadoop, Storm scales linearly
  • Fault-tolerant. Automatically reassigns tasks if a node fails
  • Reliable. 

I recently sat down with Himanshu Bari to discuss how Apache Ambari will serve as the single point of management for Hadoop 2 clusters integrated with Apache Storm and its real-time, streaming event processing.

Himanshu discusses Apache Storm’s five key benefits and how those will add to the power and stability of a Hadoop 2 stack, providing analysis of huge data flows from the second data is created and then for decades of historical analysis of that data stored in HDFS.…

In October, we announced our intent to include and support Storm as part of Hortonworks Data Platform. With this commitment, we also outlined and proposed an open roadmap to improve the enterprise readiness of this key project.  We are committed to doing this with a 100% open source approach and your feedback is immensely valuable in this process.

Today, we invite you to take a look at our Storm technical preview.…