The Hortonworks Blog

Posts categorized by : Apache Hadoop

Track Chairs have been named for Hadoop Summit Europe. Track Chairs will, in turn, select their track committees who, as a team, will decide which sessions are to be presented at Hadoop Summit Europe. They are as follow:

Operating Hadoop – Evert Lammerts, SARA

I joined Sara as a technical consultant in October 2008. In 2009 I started experimenting with non-traditional distributed processing and storage platforms, mainly Hadoop. I’m currently the lead Hadoop and related big data services.…

Introduction

Packetpig is the tool behind Packetloop. In Part One of the Introduction to Packetpig I discussed the background and motivation behind the Packetpig project and problems Big Data Security Analytics can solve. In this post I want to focus on the code and teach you how to use our building blocks to start writing your own jobs.

The ‘building blocks’ are the Packetpig custom loaders that allow you to access specific information in packet captures.…

Apache ZooKeeper™ release 3.4.5 is now available. This is a bug fix release including 3 bug fixes. Following is a summary of the critical issues fixed in the release.

ZOOKEEPER-1550: ZooKeeperSaslClient does not finish anonymous login on OpenJDK

ZOOKEEPER-1376: zkServer.sh does not correctly check for $SERVER_JVMFLAGS

ZOOKEEPER-1560: Zookeeper client hangs on creation of large nodes.

Stability of 3.4.5

Note that Apache ZooKeeper™ 3.4.5 is marked as the current stable release.…

A recurrent question on the various Hadoop mailing lists is “why does Hadoop prefer a set of separate disks to the same set managed as a RAID-0 disks array?”

It’s about time and snowflakes.

JBOD and the Allure of RAID-0

In Hadoop clusters, we recommend treating each disk separately, in a configuration that is known, somewhat disparagingly as “JBOD”: Just a Box of Disks.

In comparison RAID-0, which is a bit of misnomer, there being no redundancy, stripes data across all the disks in the array.…

Hackathon and Aeromuseum Reception

ApacheCon Europe kicked off yesterday with an all-day Hackathon followed by a committer’s reception at the Sinsheim Technik Museum, which has – among other large aircraft, a Concorde in Air France livery. My favorite was the diesel engine from a U-Boat – and its enormous drive-shaft and pistons.

Taking the Guesswork out of Hadoop Infrastructure

Winding a rented Opal through its gears along village roads for half an hour from my hotel-out-of-a-black-forest-fairy-tale, I made it to ApacheCon EU’s first day of sessions mid-way through the first talk by Steve Watt, ‘Taking the Guesswork out of Hadoop Infrastructure.’ Steve talked about the harsh reality of fitting hardware to a given workload using Hadoop with the quote: “We’ve profiled our Hadoop applications so we know what type of infrastructure we need.” — Said No One, Ever.…

Agile Data hits the road this month, crossing Europe with the good news about Hadoop and teaching Hadoop users how build value from data using Hadoop to build analytics applications.

We’ll be giving out discount coupons to Hadoop Summit Europe, which is March 20-21st in Amsterdam!

  • 11/3 – Agile Data @ The Warsaw Hadoop Users Group
  • 11/5 to 11/6 – Attending ApacheCon Europe 2012 in Sinsheim, Germany. Say hello!
  • 11/7 – Agile Data @ The France Hadoop Users Group in Paris
  • 11/8 – Agile Data @ Netherlands Hadoop Users Group in Utrecht
  • 11/12 – Agile Data @ Hadoop Users Group UK in London.…
  • You don’t see many demos like the one given by Shawn Bice (Microsoft) today in the Regent Parlor of the New York Hilton, at Strata NYC. “Drive Smarter Decisions with Microsoft Big Data,” was different.

    For starters – everything worked like clockwork. Live demos of new products are notorious for failing on-stage, even if they work in production. And although Microsoft was presenting about a Java-based platform at a largely open-source event… it was standing room only, with the crowd overflowing out the doors.…

    At Hortonworks, we fundamentally believe that, in the not-so-distant future, Apache Hadoop will process over half the world’s data flowing through businesses. We realize this is a BOLD vision that will take a lot of hard work by not only Hortonworks and the open source community, but also software, hardware, and solution vendors focused on the Hadoop ecosystem, as well as end users deploying platforms powered by Hadoop.

    If the vision is to be achieved, we need to accelerate the process of enabling the masses to benefit from the power and value of Apache Hadoop in ways where they are virtually oblivious to the fact that Hadoop is under the hood.…

    As we have said here, Hortonworks has been steadily increasing our investment in HBase. HBase’s adoption has been increasing in the enterprise. To continue this trend, we feel HBase needs investments in the areas of:

  • Reliability and High Availability (all data always available, and recovery from failures is quick)
  • Autonomous operation (minimum operator intervention)
  • Wire compatibility (to support rolling upgrades across a couple of versions at least)
  • Cross data-center replication (for disaster recovery)
  • Snapshots and backups (be able to take periodic snapshots of certain/all tables and be able to restore them at a later point if required)
  • Monitoring and Diagnostics (which regionserver is hot or what caused an outage)
  • Significant work has happened in each of the areas outlined above in the 0.94 and 0.96 (currently trunk) branches.…

    HBase is a critical component of the Apache Hadoop ecosystem and a core component of the Hortonworks Data Platform.  HBase enables a host of low latency Hadoop use-cases; As a publishing platform, HBase exposes data refined in Hadoop to outside systems; As an online column store, HBase supports the blending of random access data read/write with application workloads whose data is directly accessible to Hadoop MapReduce.

    The HBase community is moving forward aggressively, improving HBase in many ways.  …

    In this blog, I’ll cover how we tested Full Stack HA with NameNode HA in Hadooop 1 with Hadoop and HBase as components of the stack.

    Yes, NameNode HA is finally available in the Hadoop 1 line. The test was done with Hadoop branch-1 and HBase-0.92.x on a cluster of roughly ten nodes. The aim was to try to keep a really busy HBase cluster up in the face of the cluster’s NameNode repeatedly going up and down.…

    This will be the first and the largest European conference focused exclusively on accelerating the enterprise adoption of Apache Hadoop. The event will be a gathering for the vibrant Apache Hadoop community of developers, data scientists, data professionals and solution providers and will be held at the historic Beurs van Berlage in Amsterdam on March 20-21, 2013.

    Call for papers now open!

    Apache Hadoop practitioners, enthusiasts and solution providers with an idea for a talk at the event, can submit your ideas now on the call for papers page.…

    Introduction

    The Apache Hadoop YARN meetup at Hortonworks on October 12, 2012 we previously announced was a resounding success. We had a very good turnout of around seventy people from the community.

    Meetup sessions
    Deployments at Yahoo!

    The meetup kicked off with YARN committers from Yahoo presenting on current Hadoop 2.0 deployments at Yahoo. As part of the presentation, the following were covered.

    • described scenarios where YARN positively advanced the state of the art like scalability, its current stability, the power of the YARN web-services, and its superlative performance compared to the previous versions.

    Today our partner, Teradata, announced availability of the Teradata Aster Big Analytics Appliance, which packages our Hortonworks Data Platform (HDP) with Teradata Aster on machine that is ready to plug-in and bring big data value in hours.

    There is more to this appliance than meets the eye…  it is not just a simple packaging of software on hardware. Teradata and Hortonworks engineers have been working together for months tying our solutions together and optimizing them for an appliance.…

    It gives me great pleasure to announce that the Apache Hadoop community has voted to release Apache Hadoop 2.0.2-alpha.

    This is the second (alpha) release of the next generation release of Apache Hadoop 2.x and comes with significant enhancements to both the major components of Hadoop:

    • HDFS HA has undergone significant enhancements since the previous release for NameNode High Availability
    • YARN has undergone significant testing and stabilization and validation as is been heavily battle-tested since the previous release.
    Go to page:« First...10...1516171819...Last »

    Thank you for subscribing!