The Hortonworks Blog

Apache Storm and YARN extend Hadoop to handle real time processing of data and provides the ability to process and respond events as they happen. Our customers have told us many use cases for this technology combination and below we present a demo example complete with code so you can try it yourself.

For the demo below, we used our Sandbox VM which is a full implementation of the Hortonworks Data Platform.…

Hortonworks will be making a preview of Apache Storm integration available in Q4 of this year and will be including Apache Storm in the Hortonworks Data Platform in first half of 2014.

Any time now, the Apache Hadoop community will declare the General Availability of Hadoop 2.0 which includes the much anticipated Apache Hadoop YARN.  The YARN-based architecture of Hadoop 2 is the most significant change to Hadoop introduced in the past six years and enables Hadoop to expand from a single-purpose, batch-oriented data platform based on MapReduce into a truly multi-purpose platform supporting a wide range of data processing approaches.…

As part of a modern data architecture, Hadoop needs to be a good citizen and trusted as part of the heart of the business. This means it must provide for all the platform services and features that are expected of an enterprise data platform.

The Hadoop Distributed File System is the rock at the core of HDP and provides reliable, scalable access to data for all analytical processing needs. With HDP 2.0, built into the platform itself, HDFS now has automated failover with a hot standby, with full stack resiliency.…

Security is one of the biggest topics in Hadoop right now. Historically Hadoop has been a back-end system accessed only by a few specialists, but the clear trend is for companies to put data from Hadoop clusters in the hands of analysts, marketers, product managers or call center employees whose numbers could be in the hundreds or thousands. Data security and privacy controls are necessary before this transformation can occur. HDP2, through the next release of Apache Hive introduces a very important new security feature that allows you to encrypt the traffic that flows between Hadoop and popular analytics tools like Microstrategy, Tableau, Excel and others.…

I’ve been working on MapReduce frameworks since mid 2005 (Hadoop’s since the start of 2006) and a fundamental feature has always been incredible throughput to access data, but no ACID transactions. That is changing.

Recently, while working with a customer that is using Apache Hive to process terabytes (and growing quickly) of sales data, they asked how to handle a business requirement to update millions of records in their sales table each day.…

This post is the fifth in our series on the motivations, architecture and performance gains of Apache Tez for data processing in Hadoop. The series has the following posts:

Case Study: Automatic Reduce Parallelism
Motivation

Distributed data processing is dynamic by nature and it is extremely difficult to statically determine optimal concurrency and data movement methods a priori.…

On October 16, we’ve been invited to join our partner SAP to talk Big Data and how the integrated SAP HANA + Hadoop approach can solve your big data challenges. This chat will be a live Google Hangout with:

  • Irfan Khan, SVP & GM SAP Global Big Data at SAP (@i_kHANA)
  • Ari Zilka,  CTO at Hortonworks (@ikarzali)
  • Timo Elliot, Innovation Evangelist at SAP (@timoelliott)

When: Wednesday, October 16, 8am PT / 11am ET / 5pm CET…

We’re continuing our series of quick interviews with Apache Hadoop project committers at Hortonworks.

This week Mahadev Konar discusses Apache ZooKeeper, the open source Apache project that is used to coordinate various processes on a Hadoop cluster (such as electing a leader between two processes).

Mahadev was on the team at Yahoo! in 2006 that started developing what became Apache Hadoop. He has been involved with Apache ZooKeeper since 2008, when the project was open sourced.…

This post from Vinod Kumar Vavilapalli of Hortonworks and  Chris Douglas and Carlo Curino of Microsoft Research.

Great news from the Apache Hadoop YARN community! A paper describing Apache Hadoop YARN was accepted at 2013 ACM Symposium on Cloud Computing (SoCC 2013), where it won the award for best paper! Here’s the title and abstract:

Title

Apache Hadoop YARN: Yet Another Resource Negotiator [Industrial Paper]

Abstract

The initial design of Apache Hadoop was tightly focused on running massive, MapReduce jobs to process a web crawl.…

We’re continuing our series of quick interviews with Apache Hadoop project committers at Hortonworks.

This week Enis Soztutar discusses Apache HBase, built for random read/write access to data in billions of rows and millions of columns.

Enis began using Apache Hadoop in 2006. Now, Enis is a Hortonworks engineer and Apache HBase project management chair. He has also been a committer to Apache Hadoop since 2007 and to HBase since 2012.…

This post is the fourth in our series on the motivations, architecture and performance gains of Apache Tez for data processing in Hadoop. The series has the following posts:

The previous couple of blogs covered Tez concepts and APIs.…

This is a guest blog post from our partner, Actuate. They’ve been generous enough to create some great Hadoop tutorials on the Open Source BIRT project that use the Hortonworks Sandbox.

By now, Apache™ Hadoop® has become synonymous with the first stage of Big Data: storing, processing and managing huge volumes and varieties of structured and unstructured data. Yet the data stored by Hadoop remains unreadable to the average business user.…

A crucial requirement of any enterprise technology is to ensure simplest possible management and operation. We think that simplicity means two things: 1) integration with existing infrastructure and tools and 2) leveraging existing knowledge and skills.

Download the beta release of Ambari SCOM Management Pack here.

Ambari (http://incubator.apache.org/ambari/) was introduced as an Apache incubator project with the aim of developing the best management tool for Hadoop applying our principles of open source community development for rapid innovation and solving the right problems for enterprises.…

Thanks to all those who joined in person and virtually for the Apache Ambari Meetup at Hortonworks this week. We talked tech, we saw demos, we laughed, we cried, we ate pizza.

The central theme of the night was the newly added support for Hadoop 2. Ambari now has:

  • Hadoop 2 Stack: Ambari adds support for installing, managing and monitoring a Hadoop 2 Stack.
  • NameNode HA: Configure NameNode High Availability based on QJM support built-into HDFS2
  • YARN: Ambari manages YARN Service lifecycle and automatically deploys the MapReduce2 framework.

Personally, I’ve followed the Go Programming Language (golang) with increasing interest for a while and have been itching to really sink my teeth into it. I’ve always felt you never learn any programming language for real unless it’s used to build a fairly large, real-world solution. It’s the only way to gain tackle real issues and gain some confidence for future battles with destiny… FTR, my first real project in Java was Hadoop, circa 2006.…

Go to page:« First...10...1415161718...3040...Last »