He loves me, he loves me not… using daisies to figure out someone’s feelings is so last century. A much better way to determine whether someone likes you, your product or your company is to do some analysis on Twitter feeds to get better data on what the public is saying. But how do you take thousands of tweets and process them? We show you how in our video – Understand your customers’ sentiments with Social Media Data – that you can capture a Twitter stream to do Sentiment Analysis.…
From the Dev Team
Follow the latest developments from our technical team
We’re continuing our series of quick interviews with Apache Hadoop project committers at Hortonworks.
This week Venkat Ranganathan discusses using Apache Sqoop for bulk data movement between Hadoop and enterprise data stores. Sqoop can also move data the other way, from Hadoop into an EDW.
Venkat is a Hortonworks engineer and Apache Sqoop committer who wrote the connector between Sqoop and the Netezza data warehousing platform. He also worked with colleagues at Hortonworks and in the Apache community to improve integration between Sqoop and Apache HCatalog, delivered in Sqoop 1.4.4.…
As part of HDP 2.0 Beta, YARN takes the resource management capabilities that were in MapReduce and packages them so they can be used by new engines. This also streamlines MapReduce to do what it does best, process data. With YARN, you can now run multiple applications in Hadoop, all sharing a common resource management.
In this blog post we’ll walk through how to plan for and configure processing capacity in your enterprise HDP 2.0 cluster deployment.…
The upcoming Hive 0.12 is set to bring some great new advancements in the storage layer in the forms of higher compression and better query performance.Higher Compression
ORCFile was introduced in Hive 0.11 and offered excellent compression, delivered through a number of techniques including run-length encoding, dictionary encoding for strings and bitmap encoding.
This focus on efficiency leads to some impressive compression ratios. This picture shows the sizes of the TPC-DS dataset at Scale 500 in various encodings.…
The Stinger Initiative is Hortonworks’ community-facing roadmap laying out the investments Hortonworks is making to improve Hive performance 100x and evolve Hive to SQL compliance to simplify migrating SQL workloads to Hive.
We launched the Stinger Initiative along with Apache Tez to evolve Hadoop beyond its MapReduce roots into a data processing platform that satisfies the need for both interactive query AND petabyte scale processing. We believe it’s more feasible to evolve Hadoop to cover interactive needs rather than move traditional architectures into the era of big data.…
We hosted a webinar on YARN a couple of weeks ago (see the slides and playback here). As you might expect, there was a lot of great questions and here is a set of answers to those questions.
Our next YARN-oriented Office Hours online on Sept 11th at 2pm PST. Join us on Meetup!
Who is using YARN and what benefits have they received from it?
On great public example of in production use of YARN, is at Yahoo!.…
Another week, another release… Following the release of Apache Hadoop 2.0 beta last week, we are excited to release the beta of Hortonworks Data Platform 2.0, the first commercial release of the stable YARN API and protocols on which new applications can now be built.
This post is authored by Jian He with Vinod Kumar Vavilapalli and is the seventh post in the multi-part blog series on Apache Hadoop YARN – a general-purpose, distributed, application management framework that supersedes the classic Apache Hadoop MapReduce framework for processing data in Hadoop clusters. Other posts in this series:
- Introducing Apache Hadoop YARN
- Apache Hadoop YARN – Background and an Overview
- Apache Hadoop YARN – Concepts and Applications
- Apache Hadoop YARN – ResourceManager
- Apache Hadoop YARN – NodeManager
- Running existing applications on Hadoop 2 YARN
Apache Hadoop 2 is in beta now .…
In the last 60 seconds there were 1,300 new mobile users and there were 100,000 new tweets. As you contemplate what happens in an internet minute Amazon brought in $83,000 worth of sales. What would be the impact of you being able to identify:
- What is the most efficient path for a site visitor to research a product, and then buy it?
- What products do visitors tend to buy together, and what are they most likely to buy in the future?
Continuing our series of quick interviews with Apache Hadoop project committers and contributors at Hortonworks.
To follow on from yesterday’s Server Log processing with Apache Flume tutorial we talk with Roshan Naik, Hortonworks engineer and Apache Flume contributor, about what Flume is, how it works and where it’s going.
When they’re not planning to overthrow their human overlords, most servers can be found spewing out vast amounts of data in the form of server logs. As we showed in our video - Deliver responsive IT from events in Server Logs - these logs contain a lot of value.
So if you fire up the Hortonworks Sandbox today, you’ll be delighted to find Tutorial 12: Refining and Visualizing Server Log Data as a step-by-step guide to the video. …
This post authored by Zhijie Shen with Vinod Kumar Vavilapalli.
This is the sixth blog in the multi-part series on Apache Hadoop YARN – a general-purpose, distributed, application management framework that supersedes the classic Apache Hadoop MapReduce framework for processing data in Hadoop clusters. Other posts in this series:
The beta release of Apache Hadoop 2.x has finally arrived and we are striving hard to make the release easy to adopt with no or minimal pain to our existing users.…
It’s my great pleasure to announce that the Apache Hadoop community has declared Hadoop 2.x as Beta with the vote closing over the weekend for the hadoop-2.1.0-beta release.
As noted in the announcement to the mailing lists, this is a significant milestone across multiple dimensions: not only is the release chock-full of significant features (see below), it also represents a very stable set of APIs and protocols on which we can continue to build for the future.…
The next in our series of quick interviews with Apache Hadoop project committers at Hortonworks.
In this video, we talk with Sanjay Radia, Hortonworks co-founder and Apache Hadoop committer, about the initiation of HDFS, the cost benefits it brings to data storage and future directions for the project.
Before I was a developer of Hadoop, I was a user of Hadoop. I was responsible for operation and maintenance of multiple Hadoop clusters, so it’s very satisfying when I get the opportunity to implement features that make life easier for operations staff.
Have you ever wondered what’s happening during a namenode restart? A new feature coming in HDP 2.0 will give operators greater visibility into this critical process. This is a feature that would have been very useful to me in my prior role.…