The Hortonworks Blog

Over the course of 2012, through Hortonworks’ leadership within the Apache Ambari community we have seen the rapid creation of an enterprise-class management platform required for enabling Apache Hadoop to be an enterprise viable data platform.  Hortonworks engineers and the broader Ambari community have been working hard on their latest release, and we’d like to highlight the exciting progress that’s been made to Ambari, a 100% open and free solution that delivers the features required from an enterprise-class management platform for Apache Hadoop.…

The Hortonworks Data Platform (HDP) conveniently integrates numerous Big Data tools in the Hadoop ecosystem. As such, it provides cluster-oriented storage, processing, monitoring, and data integration services. HDP simplifies the deployment and management of a production Hadoop-based system.

In Hadoop, data is represented as key/value pairs. In HBase, data is represented as a collection of wide rows. These atomic structures makes global data processing (via MapReduce) and row-specific reading/writing (via HBase) simple.…

For the last couple months, Hortonworks has been excited to be a proud sponsor of the Big Analytics 2012 roadshow.  These roadshows have provided us some great insights into the role of Apache Hadoop in this emerging Big Data market.  We had some great discussions with attendees regarding their current and future plans for the use of Hadoop and other Big Data technologies. Another interesting insight was the need for Data skills, people who know what to ask of that data and how to use tools like Hadoop to provide patterns, answers, interpretations and present the data.…

If you been following #hadoopsummit on twitter you might have noticed some excitement around the community choice, a public voting system that enables the entire Apache Hadoop community to have a say in the sessions chosen for #hadoopsummit EU. Anyone can vote and the top vote getters in each track will automatically be included in the #hadoopsummit EU agenda, March 20-21, 2013.

If you’re still deciding which sessions, in which tracks, should be so lucky to get your vote, I have one for your consideration.…

The Hadoop Summit Europe official call for papers ends this Friday, November 30th – so be sure to get your session submissions in this week!

Hadoop Summit Europe is March 20, 21 at the Beurs van Berlage in Amsterdam, Netherlands. You still have time to submit an abstract now!

The four content tracks are:

Applied Hadoop

Sessions in this track focus on applications, tools, algorithms and data science as well as areas of advanced research and emerging applications that use and extend the Hadoop platform.…

Thankful…

Happy Thanksgiving!

Today, like the rest of the U.S., we take a pause from our regular blog schedule to give thanks…

We are thankful for mappers and reducers. We are thankful for namenodes and jobtrackers. We give thanks to speculative execution battling the march of the last reducer. Give thanks to every petabyte, terabyte, gigabyte, file and block of data. We are thankful for the capacity scheduler.

We are very thankful for many things here at Hortonworks and I know many of us are thankful for an extra long weekend.…

Track Chairs have been named for Hadoop Summit Europe. Track Chairs will, in turn, select their track committees who, as a team, will decide which sessions are to be presented at Hadoop Summit Europe. They are as follow:

Operating Hadoop – Evert Lammerts, SARA

I joined Sara as a technical consultant in October 2008. In 2009 I started experimenting with non-traditional distributed processing and storage platforms, mainly Hadoop. I’m currently the lead Hadoop and related big data services.…

Introduction

Packetpig is the tool behind Packetloop. In Part One of the Introduction to Packetpig I discussed the background and motivation behind the Packetpig project and problems Big Data Security Analytics can solve. In this post I want to focus on the code and teach you how to use our building blocks to start writing your own jobs.

The ‘building blocks’ are the Packetpig custom loaders that allow you to access specific information in packet captures.…

Apache ZooKeeper™ release 3.4.5 is now available. This is a bug fix release including 3 bug fixes. Following is a summary of the critical issues fixed in the release.

ZOOKEEPER-1550: ZooKeeperSaslClient does not finish anonymous login on OpenJDK

ZOOKEEPER-1376: zkServer.sh does not correctly check for $SERVER_JVMFLAGS

ZOOKEEPER-1560: Zookeeper client hangs on creation of large nodes.

Stability of 3.4.5

Note that Apache ZooKeeper™ 3.4.5 is marked as the current stable release.…

A recurrent question on the various Hadoop mailing lists is “why does Hadoop prefer a set of separate disks to the same set managed as a RAID-0 disks array?”

It’s about time and snowflakes.

JBOD and the Allure of RAID-0

In Hadoop clusters, we recommend treating each disk separately, in a configuration that is known, somewhat disparagingly as “JBOD”: Just a Box of Disks.

In comparison RAID-0, which is a bit of misnomer, there being no redundancy, stripes data across all the disks in the array.…

Hackathon and Aeromuseum Reception

ApacheCon Europe kicked off yesterday with an all-day Hackathon followed by a committer’s reception at the Sinsheim Technik Museum, which has – among other large aircraft, a Concorde in Air France livery. My favorite was the diesel engine from a U-Boat – and its enormous drive-shaft and pistons.

Taking the Guesswork out of Hadoop Infrastructure

Winding a rented Opal through its gears along village roads for half an hour from my hotel-out-of-a-black-forest-fairy-tale, I made it to ApacheCon EU’s first day of sessions mid-way through the first talk by Steve Watt, ‘Taking the Guesswork out of Hadoop Infrastructure.’ Steve talked about the harsh reality of fitting hardware to a given workload using Hadoop with the quote: “We’ve profiled our Hadoop applications so we know what type of infrastructure we need.” — Said No One, Ever.…

Agile Data hits the road this month, crossing Europe with the good news about Hadoop and teaching Hadoop users how build value from data using Hadoop to build analytics applications.

We’ll be giving out discount coupons to Hadoop Summit Europe, which is March 20-21st in Amsterdam!

  • 11/3 – Agile Data @ The Warsaw Hadoop Users Group
  • 11/5 to 11/6 – Attending ApacheCon Europe 2012 in Sinsheim, Germany. Say hello!
  • 11/7 – Agile Data @ The France Hadoop Users Group in Paris
  • 11/8 – Agile Data @ Netherlands Hadoop Users Group in Utrecht
  • 11/12 – Agile Data @ Hadoop Users Group UK in London.…
  • You don’t see many demos like the one given by Shawn Bice (Microsoft) today in the Regent Parlor of the New York Hilton, at Strata NYC. “Drive Smarter Decisions with Microsoft Big Data,” was different.

    For starters – everything worked like clockwork. Live demos of new products are notorious for failing on-stage, even if they work in production. And although Microsoft was presenting about a Java-based platform at a largely open-source event… it was standing room only, with the crowd overflowing out the doors.…

    This guest blog post is from Microsoft’s Dave Campbell providing more details on why they chose Hortonworks for  HDInsights.

    Last February at Strata Conference in Santa Clara we shared Microsoft’s progress on Big Data, specifically working to broaden the adoption of Hadoop with the simplicity and manageability of Windows and enabling customers to easily derive insights from their structured and unstructured data through familiar tools like Excel.

    Hortonworks is a recognized pioneer in the Hadoop Community and a leading contributor to the Apache Hadoop project, and that’s why we’re excited to announce our expanded partnership with Hortonworks to give customers access to an enterprise-ready distribution of Hadoop that is 100 percent compatible with Windows Server and Windows Azure. …

    As we speed towards wide spread enterprise adoption of Apache Hadoop, it has become readily apparent that this new data platform must not only capture, process and distribute data, but it also must be able to be deployed in a variety of ways, be it on premise, in a VM, as an appliance or better yet in the cloud…

    Today we announced a new relationship with Rackspace in which we will develop an OpenStack based Hadoop solution for the public and private cloud.…

    Go to page:« First...1020...2829303132...40...Last »