Another week, another release… Following the release of Apache Hadoop 2.0 beta last week, we are excited to release the beta of Hortonworks Data Platform 2.0, the first commercial release of the stable YARN API and protocols on which new applications can now be built.
From the Dev Team
Follow the latest developments from our technical team
This post is authored by Jian He with Vinod Kumar Vavilapalli and is the seventh post in the multi-part blog series on Apache Hadoop YARN – a general-purpose, distributed, application management framework that supersedes the classic Apache Hadoop MapReduce framework for processing data in Hadoop clusters. Other posts in this series:
- Introducing Apache Hadoop YARN
- Apache Hadoop YARN – Background and an Overview
- Apache Hadoop YARN – Concepts and Applications
- Apache Hadoop YARN – ResourceManager
- Apache Hadoop YARN – NodeManager
- Running existing applications on Hadoop 2 YARN
Apache Hadoop 2 is in beta now .…
In the last 60 seconds there were 1,300 new mobile users and there were 100,000 new tweets. As you contemplate what happens in an internet minute Amazon brought in $83,000 worth of sales. What would be the impact of you being able to identify:
- What is the most efficient path for a site visitor to research a product, and then buy it?
- What products do visitors tend to buy together, and what are they most likely to buy in the future?
Continuing our series of quick interviews with Apache Hadoop project committers and contributors at Hortonworks.
To follow on from yesterday’s Server Log processing with Apache Flume tutorial we talk with Roshan Naik, Hortonworks engineer and Apache Flume contributor, about what Flume is, how it works and where it’s going.
When they’re not planning to overthrow their human overlords, most servers can be found spewing out vast amounts of data in the form of server logs. As we showed in our video - Deliver responsive IT from events in Server Logs - these logs contain a lot of value.
So if you fire up the Hortonworks Sandbox today, you’ll be delighted to find Tutorial 12: Refining and Visualizing Server Log Data as a step-by-step guide to the video. …
This post authored by Zhijie Shen with Vinod Kumar Vavilapalli.
This is the sixth blog in the multi-part series on Apache Hadoop YARN – a general-purpose, distributed, application management framework that supersedes the classic Apache Hadoop MapReduce framework for processing data in Hadoop clusters. Other posts in this series:
The beta release of Apache Hadoop 2.x has finally arrived and we are striving hard to make the release easy to adopt with no or minimal pain to our existing users.…
It’s my great pleasure to announce that the Apache Hadoop community has declared Hadoop 2.x as Beta with the vote closing over the weekend for the hadoop-2.1.0-beta release.
As noted in the announcement to the mailing lists, this is a significant milestone across multiple dimensions: not only is the release chock-full of significant features (see below), it also represents a very stable set of APIs and protocols on which we can continue to build for the future.…
The next in our series of quick interviews with Apache Hadoop project committers at Hortonworks.
In this video, we talk with Sanjay Radia, Hortonworks co-founder and Apache Hadoop committer, about the initiation of HDFS, the cost benefits it brings to data storage and future directions for the project.
Before I was a developer of Hadoop, I was a user of Hadoop. I was responsible for operation and maintenance of multiple Hadoop clusters, so it’s very satisfying when I get the opportunity to implement features that make life easier for operations staff.
Have you ever wondered what’s happening during a namenode restart? A new feature coming in HDP 2.0 will give operators greater visibility into this critical process. This is a feature that would have been very useful to me in my prior role.…
UPDATE: This cheat sheet was so popular, we’ve created a PDF of the content below so you can print it and use it more easily. Download here.
If you’re already familiar with SQL then you may well be thinking about how to add Hadoop skills to your toolbelt as an option for data processing.
From a querying perspective, using Apache Hive provides a familiar interface to data held in a Hadoop cluster and is a great way to get started.…
If you want to understand the thinking in the various projects in the Hadoop ecosystem, then who better to talk to than key members of those projects – the committers.
In this video, we talk with Owen O’Malley, Hortonworks co-founder and Apache Hive committer, about the initiation of Hive, why it matters and future directions for the project.
In this blog we’ll set up NFS for HDFS access with the Hortonworks Sandbox 1.3. This allows the reading and writing of files to Hadoop using familiar methods to desktop users. Sandbox is a great way to understand this particular type of access.
If you don’t have it already, then download the sandbox here. Got the download? Then let’s get started.
Start the Sandbox. Get to this screen.
We will now enable Ambari so that we can edit the configuration to enable NFS.…
Today we released the Hortonworks Data Platform 1.3 for Windows for Windows Server 2008 R2 and 2012. This is an exciting major update to the only Enterprise Hadoop distribution on Windows. In this blog post, I will discuss what’s new and how to get started.
Enabling new data applications
This release brings component parity to the HDP Stack across all operating systems by adding the following components:
- Apache HBase (0.94.6.1) is a non-relational (NoSQL) database that runs on top of the Hadoop® Distributed File System (HDFS).
Thanks to all who joined us for last week’s webinar on Apache Hadoop YARN: Enabling Next Generation Data Applications. You can listen to the full webinar replay here, and the slides are embedded below.
If you’re already diving into YARN, then we will be hosting the first ‘Office Hours’ sessions at Hortonworks HQ. Join us on August 15th for a Deep Dive on Hoya (HBase on YARN).
Office hours will give you a chance to talk with those Hortonworks developers deeply involved with YARN and Hoya projects as well as your peers just launching their YARN projects. …
The Hortonworks Sandbox is a great tool for not only learning Hadoop, but also for experimentation and application development. Deployment in a type 2 hypervisor such as Oracle VirtualBox or VMWare Workstation is straightforward and serves the need for a single user. Sandbox can also be deployed to IaaS environments, and in this case, we walk through the steps of deploying Hortonworks Sandbox on OpenStack. For the purposes of this article, the author has used OpenStack Grizzly release running QEMU-KVM as the underlying hypervisor.…