Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
October 19, 2017 | Shelby Khan | Dataworks Summit

7 Sessions From DataWorks Summit Sydney You Should See

October 18, 2017 | Kevin Jordan | Hortonworks Case Study

How Much Can You Trust Your Big Data?

October 16, 2017 | Matt Spillar | Hortonworks Case Study

Leveraging Data to Make Decisions in Financial Services

Viewing posts by: Russell Jurney« Back to all


All Topics

All Channels


Ambari on EC2

Check out our new knowledgebase article on Ambari on EC2. With these instructions, you can boot an EC2 Apache Hadoop cluster in minutes using Ambari.

Big Data Defined

‘Big Data’ has become a hot buzzword, but a poorly defined one. Here we will define it. Wikipedia defines Big Data in terms of the problems posed by the awkwardness of legacy tools in supporting massive datasets: In information technology, big data[1][2] is a collection of data sets so large and complex that it becomes […]

In this post, we’ll explain the difference between Hadoop 1.0 and 2.0. After all, what is Hadoop 2.0? What is YARN? For starters – what is Hadoop and what is 1.0? The Apache Hadoop project is the core of an entire ecosystem of projects. It consists of four modules (see here): Hadoop Common: The common […]

In part one of this series, we covered how to download your tweet archive from Twitter, ETL it into json/newline format, and to extract a Hive schema. In this post, we will load our tweets into Hive and query them to learn about our little world. To load our tweet-JSON into Hive, we’ll use the […]

Note: Continued in part two… Your Twitter Archive Twitter has a new feature, Your Twitter Archive, that enables any user to download their tweets as an archive. To view this feature, look at the bottom of the page at your account settings page. There should be an option for ‘Your Twitter archive,’ which will generate […]

Touring Ambari

Hot on the heels of the release of the new version of Sandbox, I thought it would be worth a look at Ambari as it is now integrated into the Sandbox VM. You can download the Hortonworks Sandbox and try it out for yourself! Apache Ambari is a web-based tool for provisioning, managing, and monitoring […]

Installing the Hortonworks Data Platform for Windows couldn’t be easier. Lets take a look at how to install a one node cluster on your Windows Server 2012 machine. Follow @hortonworksto let us know if you’d like more content like this. /center>To start, download the HDP for Windows MSI at It is about 460MB, and […]

Apache Pig version 0.11 was released last week. An Apache Pig blog post summarized the release. New features include: A DateTime datatype, documentation here. A RANK function, documentation here. A CUBE operator, documentation here. Groovy UDFs, documentation here. And many improvements. Oink it up for Pig 0.11! Hortonworks’ Daniel Dai gave a talk on Pig […]

Pig can easily stuff Redis full of data. To do so, we’ll need to convert our data to JSON. We’ve previously talked about pig-to-json in JSONize anything in Pig with ToJson. Once we convert our data to json, we can use the pig-redis project to load redis. Build the pig to json project: git clone […]

According to the Transaction Processing Council, TPC-H is: The TPC Benchmark™H (TPC-H) is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine […]

The Hadoop Summit Europe official call for papers ends this Friday, November 30th – so be sure to get your session submissions in this week! Hadoop Summit Europe is March 20, 21 at the Beurs van Berlage in Amsterdam, Netherlands. You still have time to submit an abstract now! The four content tracks are: Applied […]

Track Chairs have been named for Hadoop Summit Europe. Track Chairs will, in turn, select their track committees who, as a team, will decide which sessions are to be presented at Hadoop Summit Europe. They are as follow: Operating Hadoop – Evert Lammerts, SARA I joined Sara as a technical consultant in October 2008. In […]

Hackathon and Aeromuseum Reception ApacheCon Europe kicked off yesterday with an all-day Hackathon followed by a committer’s reception at the Sinsheim Technik Museum, which has – among other large aircraft, a Concorde in Air France livery. My favorite was the diesel engine from a U-Boat – and its enormous drive-shaft and pistons. Taking the Guesswork […]

Agile Data hits the road this month, crossing Europe with the good news about Hadoop and teaching Hadoop users how build value from data using Hadoop to build analytics applications. We’ll be giving out discount coupons to Hadoop Summit Europe, which is March 20-21st in Amsterdam! 11/3 – Agile Data @ The Warsaw Hadoop Users […]

You don’t see many demos like the one given by Shawn Bice (Microsoft) today in the Regent Parlor of the New York Hilton, at Strata NYC. “Drive Smarter Decisions with Microsoft Big Data,” was different. For starters – everything worked like clockwork. Live demos of new products are notorious for failing on-stage, even if they […]