The Hortonworks Blog

If you been following #hadoopsummit on twitter you might have noticed some excitement around the community choice, a public voting system that enables the entire Apache Hadoop community to have a say in the sessions chosen for #hadoopsummit EU. Anyone can vote and the top vote getters in each track will automatically be included in the #hadoopsummit EU agenda, March 20-21, 2013.

If you’re still deciding which sessions, in which tracks, should be so lucky to get your vote, I have one for your consideration.…

The Hadoop Summit Europe official call for papers ends this Friday, November 30th – so be sure to get your session submissions in this week!

Hadoop Summit Europe is March 20, 21 at the Beurs van Berlage in Amsterdam, Netherlands. You still have time to submit an abstract now!

The four content tracks are:

Applied Hadoop

Sessions in this track focus on applications, tools, algorithms and data science as well as areas of advanced research and emerging applications that use and extend the Hadoop platform.…

Thankful…

Happy Thanksgiving!

Today, like the rest of the U.S., we take a pause from our regular blog schedule to give thanks…

We are thankful for mappers and reducers. We are thankful for namenodes and jobtrackers. We give thanks to speculative execution battling the march of the last reducer. Give thanks to every petabyte, terabyte, gigabyte, file and block of data. We are thankful for the capacity scheduler.

We are very thankful for many things here at Hortonworks and I know many of us are thankful for an extra long weekend.…

Track Chairs have been named for Hadoop Summit Europe. Track Chairs will, in turn, select their track committees who, as a team, will decide which sessions are to be presented at Hadoop Summit Europe. They are as follow:

Operating Hadoop – Evert Lammerts, SARA

I joined Sara as a technical consultant in October 2008. In 2009 I started experimenting with non-traditional distributed processing and storage platforms, mainly Hadoop. I’m currently the lead Hadoop and related big data services.…

Introduction

Packetpig is the tool behind Packetloop. In Part One of the Introduction to Packetpig I discussed the background and motivation behind the Packetpig project and problems Big Data Security Analytics can solve. In this post I want to focus on the code and teach you how to use our building blocks to start writing your own jobs.

The ‘building blocks’ are the Packetpig custom loaders that allow you to access specific information in packet captures.…

Apache ZooKeeper™ release 3.4.5 is now available. This is a bug fix release including 3 bug fixes. Following is a summary of the critical issues fixed in the release.

ZOOKEEPER-1550: ZooKeeperSaslClient does not finish anonymous login on OpenJDK

ZOOKEEPER-1376: zkServer.sh does not correctly check for $SERVER_JVMFLAGS

ZOOKEEPER-1560: Zookeeper client hangs on creation of large nodes.

Stability of 3.4.5

Note that Apache ZooKeeper™ 3.4.5 is marked as the current stable release.…

A recurrent question on the various Hadoop mailing lists is “why does Hadoop prefer a set of separate disks to the same set managed as a RAID-0 disks array?”

It’s about time and snowflakes.

JBOD and the Allure of RAID-0

In Hadoop clusters, we recommend treating each disk separately, in a configuration that is known, somewhat disparagingly as “JBOD”: Just a Box of Disks.

In comparison RAID-0, which is a bit of misnomer, there being no redundancy, stripes data across all the disks in the array.…

Hackathon and Aeromuseum Reception

ApacheCon Europe kicked off yesterday with an all-day Hackathon followed by a committer’s reception at the Sinsheim Technik Museum, which has – among other large aircraft, a Concorde in Air France livery. My favorite was the diesel engine from a U-Boat – and its enormous drive-shaft and pistons.

Taking the Guesswork out of Hadoop Infrastructure

Winding a rented Opal through its gears along village roads for half an hour from my hotel-out-of-a-black-forest-fairy-tale, I made it to ApacheCon EU’s first day of sessions mid-way through the first talk by Steve Watt, ‘Taking the Guesswork out of Hadoop Infrastructure.’ Steve talked about the harsh reality of fitting hardware to a given workload using Hadoop with the quote: “We’ve profiled our Hadoop applications so we know what type of infrastructure we need.” — Said No One, Ever.…

Agile Data hits the road this month, crossing Europe with the good news about Hadoop and teaching Hadoop users how build value from data using Hadoop to build analytics applications.

We’ll be giving out discount coupons to Hadoop Summit Europe, which is March 20-21st in Amsterdam!

  • 11/3 – Agile Data @ The Warsaw Hadoop Users Group
  • 11/5 to 11/6 – Attending ApacheCon Europe 2012 in Sinsheim, Germany. Say hello!
  • 11/7 – Agile Data @ The France Hadoop Users Group in Paris
  • 11/8 – Agile Data @ Netherlands Hadoop Users Group in Utrecht
  • 11/12 – Agile Data @ Hadoop Users Group UK in London.
  • You don’t see many demos like the one given by Shawn Bice (Microsoft) today in the Regent Parlor of the New York Hilton, at Strata NYC. “Drive Smarter Decisions with Microsoft Big Data,” was different.

    For starters – everything worked like clockwork. Live demos of new products are notorious for failing on-stage, even if they work in production. And although Microsoft was presenting about a Java-based platform at a largely open-source event… it was standing room only, with the crowd overflowing out the doors.…

    This guest blog post is from Microsoft’s Dave Campbell providing more details on why they chose Hortonworks for  HDInsights.

    Last February at Strata Conference in Santa Clara we shared Microsoft’s progress on Big Data, specifically working to broaden the adoption of Hadoop with the simplicity and manageability of Windows and enabling customers to easily derive insights from their structured and unstructured data through familiar tools like Excel.

    Hortonworks is a recognized pioneer in the Hadoop Community and a leading contributor to the Apache Hadoop project, and that’s why we’re excited to announce our expanded partnership with Hortonworks to give customers access to an enterprise-ready distribution of Hadoop that is 100 percent compatible with Windows Server and Windows Azure. …

    As we speed towards wide spread enterprise adoption of Apache Hadoop, it has become readily apparent that this new data platform must not only capture, process and distribute data, but it also must be able to be deployed in a variety of ways, be it on premise, in a VM, as an appliance or better yet in the cloud…

    Today we announced a new relationship with Rackspace in which we will develop an OpenStack based Hadoop solution for the public and private cloud.…

    This is Russell Jurney, your Big Data reporter on the ground here at Strata NYC/Hadoop World at the New York Hilton. Monday night’s main event was Big Data Camp. As in any unconference, the best action was in the hallway, meeting people you only know by reputation or from twitter. Highlights were:

    • Microsoft’s demonstration of Excel -Power Pivot -Hortonworks Data Platform
    • In light of today’s announcement – the Hadoop market just got MUCH bigger

    • Druid: Real-Time Analytics at a Billion Rows Per Second by Eric Tschetter, Co-founder of Metamarkets
    • In-RAM stores are an interesting new development as RAM becomes cheaper and cheaper, and can augment a Hadoop-centric workload.

    At Hortonworks, we fundamentally believe that, in the not-so-distant future, Apache Hadoop will process over half the world’s data flowing through businesses. We realize this is a BOLD vision that will take a lot of hard work by not only Hortonworks and the open source community, but also software, hardware, and solution vendors focused on the Hadoop ecosystem, as well as end users deploying platforms powered by Hadoop.

    If the vision is to be achieved, we need to accelerate the process of enabling the masses to benefit from the power and value of Apache Hadoop in ways where they are virtually oblivious to the fact that Hadoop is under the hood.…

    As we have said here, Hortonworks has been steadily increasing our investment in HBase. HBase’s adoption has been increasing in the enterprise. To continue this trend, we feel HBase needs investments in the areas of:

  • Reliability and High Availability (all data always available, and recovery from failures is quick)
  • Autonomous operation (minimum operator intervention)
  • Wire compatibility (to support rolling upgrades across a couple of versions at least)
  • Cross data-center replication (for disaster recovery)
  • Snapshots and backups (be able to take periodic snapshots of certain/all tables and be able to restore them at a later point if required)
  • Monitoring and Diagnostics (which regionserver is hot or what caused an outage)
  • Significant work has happened in each of the areas outlined above in the 0.94 and 0.96 (currently trunk) branches.…

    Go to page:« First...1020...3031323334...40...Last »