The Hortonworks Blog

Hortonworks Data Platform 1.1 Brings Expanded High Availability and Streaming Data Capture, Easier Integration with Existing Tools to Improve Enterprise Reliability and Performance of Apache Hadoop

It is exactly three months to the day that Hortonworks Data Platform version 1.0 was announced. A lot has happened since that day…

  • Our distribution has been downloaded by thousands and is delivering big value to organizations throughout the world,
  • Hadoop Summit gathered over 2200 Hadoop enthusiasts into the San Jose Convention Center,
  • And, our Hortonworks team grew by leaps and bounds!

Partner Webinar Series

Hortonworks boasts a rich and vibrant ecosystem of partners representing a huge array of solutions that leverage Hadoop, and specifically Hortonworks Data Platform, to provide big data insights for customers. The goal of our Partner Webinar Series is to help communicate the value and benefit of our partners’ solutions and how they connect and use Hortonworks Data Platform.

Look to the Clouds

Setting up a big data cluster can be difficult, especially considering the assembly of all the all the equipment, power, and space to make it happen.…

Other posts in this series: Introducing Apache Hadoop YARN Apache Hadoop YARN – Background and an Overview Apache Hadoop YARN – Concepts and Applications Apache Hadoop YARN – ResourceManager Apache Hadoop YARN – NodeManager

Apache Hadoop YARN – NodeManager

The NodeManager (NM) is YARN’s per-node agent, and takes care of the individual compute nodes in a Hadoop cluster. This includes keeping up-to date with the ResourceManager (RM), overseeing containers’ life-cycle management; monitoring resource usage (memory, CPU) of individual containers, tracking node-health, log’s management and auxiliary services which may be exploited by different YARN applications.…

Twitter Analytics presented their distributed infrastructure, including Hadoop and Pig, at a UC Berkeley iSchool special course called INFO 290: Analyzing Big Data with Twitter. Twitter is a major contributor to many Apache projects. The course was over-subscribed and was a great success, as students got to learn from practicing data scientists using Hadoop on truly massive datasets. The entire lecture series is available here.

Bill Graham @billgraham, a Data Systems Engineer at Twitter Analytics and Apache Pig committer, presented an Introduction to Hadoop.…

Series Introduction

Hortonworks is on a mission to accelerate the development and adoption of Apache Hadoop. Through engineering open source Hadoop, our efforts with our distribution, Hortonworks Data Platform (HDP), a 100% open source data management platform, and partnerships with the likes of Microsoft, Teradata, Talend and others, we will accomplish this, one installation at a time.

What makes this mission possible is our all-star team of Hadoop committers. In this series, we’re going to profile those committers, to show you the face of Hadoop.…

During the ‘Future of Apache Hadoop’ webinar series, Hortonworks founders and core committers will discuss the future of Hadoop and related projects including Apache Pig, Apache Ambari, Apache Zookeeper and Apache Hadoop YARN.

Apache Hadoop has rapidly evolved to become the leading platform for managing, processing and analyzing big data. Consequently there is a thirst for knowledge on the future direction for Hadoop related projects. The Hortonworks webinar series will feature core committers of the Apache projects discussing the essential components required in a Hadoop Platform, current advances in Apache Hadoop, relevant use-cases and best practices on how to get started with the open source platform.…

Introduction

In this post, Hortonworks Intern Jie Li talks about his work this summer on performance analysis and optimization of Apache Pig. Jie is a PhD candidate in the Department of Computer Science at Duke University. His research interests are in the area of database systems and big data computing. He is currently working with Associate Professor Shivnath Babu.

Pig Performance Analysis and Optimization

I am proud that I was among the first several interns at Hortonworks, one of the leaders in the Hadoop community.…

Other posts in this series: Introducing Apache Hadoop YARN Apache Hadoop YARN – Background and an Overview Apache Hadoop YARN – Concepts and Applications Apache Hadoop YARN – ResourceManager Apache Hadoop YARN – NodeManager

Apache Hadoop YARN – ResourceManager

As previously described, ResourceManager (RM) is the master that arbitrates all the available cluster resources and thus helps manage the distributed applications running on the YARN system. It works together with the per-node NodeManagers (NMs) and the per-application ApplicationMasters (AMs).…

The August Pig Hackathon brought Pig users from Hortonworks, Yahoo, Cloudera, Visa, Kaiser Permanente, and LinkedIn to Hortonworks HQ in Sunnyvale, CA to talk and work on Apache Pig.

Jonathan Coveney and Bill Graham from Twitter walked newer Pig users through how Pig translates a Pig Latin script to map reduce jobs and went over how to read the output of explain. For those interested, Hortonworks founder Alan Gates covers this in Chapter 1 of Programming Pig.…

Introduction

A Highly Available NameNode for HDFS has been in development since last year. That effort focused singularly on the automatic failover of the NameNode for Hadoop 2.0. During that time we have realized two things.

First, we realized we should use an outside-in approach to the HA problem: start by designing the availability of the Hadoop system as a whole and then focus on the high-availability of individual components; that work lead to the Full Stack HA Architecture.…

Series Introduction

Apache Pig is a dataflow oriented, scripting interface to Hadoop. Pig enables you to manipulate data as tuples in simple pipelines without thinking about the complexities of MapReduce.

But Pig is more than that. Pig has emerged as the ‘duct tape’ of Big Data, enabling you to send data between distributed systems in a few lines of code. In this series, we’re going to show you how to use Hadoop and Pig to connect different distributed systems to enable you to process data from wherever and to wherever you like.…

Pre-crime? Pretty close…

If you have seen the futuristic movie Minority Report, you most likely have an idea of how many factors and decisions go into crime prevention. Yes, Pre-crime is an aspect of the future but even today it is clear that many social, economic, psychological, racial, and geographical circumstances must be thoroughly considered in order to make crime prediction even partially possible and accurate. The predictive analytics made possible with Apache Hadoop can significantly benefit this area of government security.…

This is the first part of a series written by Charles Boicey from the UC Irvine Medical Center.  The series will demonstrate a real case study for Apache Hadoop in healthcare and also journal the architecture and technical considerations presented during implementation.

With a single observation in early 2011, the Hadoop strategy at UC Irvine Medical Center started. While using Twitter, Facebook, LinkedIn and Yahoo we came to the conclusion that healthcare data although domain specific is structurally not much different than a tweet, Facebook posting or LinkedIn profile and that the environment powering these applications should be able to do the same with healthcare data.…

This week, I spent some time and enjoyed speaking at the Softgrid 2012 conference in San Francisco. It was a great collection of speakers and attendees and opened my eyes to some Hadoop driven possibilities that not only differentiate utilities companies but will also transform our day-to-day lives.

The conference focused on software (in this case intelligent analytics) as a competitive advantage to enable value and growth for utilities.  These often large and historically conservative organizations have moved beyond the notion that their sole business is to distribute electric power efficiently, reliably, and cost-effectively to consumers.…

Do you want to understand how Apache Hadoop can benefit your business? Do you understand the relationship between Hadoop and your Big Data initiatives? Are you struggling to explain the benefits of Hadoop to your management team?

At Hortonworks, we are constantly being asked by business and executive audiences to explain use cases, benefits and components of Hadoop. While the interest in Big Data and Hadoop grows, this urgent and often pressing demand for a map to create value and differentiation amplifies.…

Go to page:« First...102030...3435363738...Last »