Hadoop Insights

News about Hadoop in the wild; how Hadoop is being used; how Hadoop can be used.

Behind all the Big Data hype, there is one common thread: Apache Hadoop and its associated components ARE the technology platform of choice. And here at Hortonworks, that’s what we do: Hadoop.

That is also why we are so excited about the incredible growth in customers who have chosen to work with us to ensure their implementation of Hadoop and realize their vision of a modern data architecture.

Here are the key reasons we believe that we can best help your enterprise with Apache Hadoop.…

You did it! Last Sunday we challenged you to “Learn Hadoop in 7 days”. We hope that you have risen to the test and kept up with the tutorials we’ve posted each day through Twitter and Facebook. These tutorials should have helped you delve into:

By now, you should feel comfortable with Hadoop clickstream analysis, Hortonworks ODBC driver configuration, and many other important components of Hadoop.…

Apache Storm and YARN extend Hadoop to handle real time processing of data and provides the ability to process and respond events as they happen. Our customers have told us many use cases for this technology combination and below we present a demo example complete with code so you can try it yourself.

For the demo below, we used our Sandbox VM which is a full implementation of the Hortonworks Data Platform.…

A lot of people ask me: how do I become a data scientist? I think the short answer is: as with any technical role, it isn’t necessarily easy or quick, but if you’re smart, committed and willing to invest in learning and experimentation, then of course you can do it.

In a previous post, I described my view on “What is a data scientist?”: it’s a hybrid role that combines the “applied scientist” with the “data engineer”. …

How big is big anyway? What sort of size and shape does a Hadoop cluster take?

These are great questions as you begin to plan a Hadoop implementation. Designing and sizing a cluster is complex and something our technical teams spend a lot of time working with customers on: from storage size to growth rates, from compression rates to cooling then there are many factors to take into account.

To make that a little more fun, we’ve built a cluster-size-o-tron which performs a more simplistic calculation based on some assumptions on node sizes and data payloads to give an indication of how big your particular big is.…

Just a couple of weeks ago we published our simple SQL to Hive Cheat Sheet. That has proven immensely popular with a lot of folk to understand the basics of querying with Hive.  Our friends at Qubole were kind enough to work with us to extend and enhance the original cheat sheet with more advanced features of Hive: User Defined Functions (UDF). In this post, Gil Allouche of Qubole takes us from the basics of Hive through to getting started with more advanced uses, which we’ve compiled into another cheat sheet you can download here.…

Syncsort, a technology partner with Hortonworks, helps organizations propel Hadoop projects with a tool that makes it easy to “Collect, Process and Distribute” data with Hadoop. This process, often called ETL (Exchange, Transform, Load), is one of the key drivers for Hadoop initiatives; but why is this technology a key enabler of Hadoop? To find out the answer we talked with Syncsort’s Director Of Strategy, Steve Totman, a 15 year veteran of data integration and warehousing, provided his perspective on Data Warehouse Staging Areas.…

Building a modern data architecture with Hadoop delivering high-scale and low-cost data processing means integrating Hadoop effectively inside the data center. For this post, we asked Yves de Montcheuil, VP of Marketing at Talend about his customers’ experiences with Hadoop integration. Here’s what he had to say:

Most organizations are still in the early stages of big data adoption, and few have thought beyond the technology angle of how big data will profoundly impact their processes and their information architecture.…

Think Big Analytics, a Hortonworks systems integration partner has been helping customers navigate the complex world of Hadoop successfully for the past three years.  Over the years they have seen it all and have developed one of the most mature Hadoop implementation methodologies known.  Recently, we asked Ron Bodkin, Founder and CEO of Think Big Analytics to share some insight.

What are the “Must-Dos” Before Starting a Big Data Project?…

Historical data is now an essential tool for businesses as they struggle to meet increasingly stringent regulatory requirements, manage risk and perform predictive analytics that help improve business decisions. And while recent data may be available from an enterprise data warehouse, the traditional practice of archiving old data offsite on tape makes business analytics challenging, if not impossible, because the historical information needed is simply unavailable.

Fortunately, the modern approach to data storage business analytics utilizes technologies like virtualization and big data Hadoop clusters to enable partitioned access to historical data.…

This guest post from Sofia Parfenovich, Data Scientist at Altoros Systems, a big data specialist and a Hortonworks System Integrator partner. Sofia explains she optimized a customer’s trading solution by using Hadoop (Hortonworks Data Platform) and by clustering stock data.

Automated trading solutions are widely used by investors, banks, funds, and other stock market players. These systems are based on complex mathematical algorithms and can take into account hundreds of factors.…

If you’re considering the WHY, the HOW and the WHAT of Hadoop and Big Data in your business, then this collection of papers and ebooks is your friend.

  • WHY does Hadoop matter? Our eBook “Disruptive Possibilities of Big Data” paints a picture of the future of the data-driven business and how it changes everything.
  • HOW does Hadoop work in my data architecture? As part of a modern data architecture, Hadoop sits alongside existing infrastructure and augments its capabilities through Refining and Exploring big datasets and ultimately enriching the application and customer experiences for your business.

Airline pricing has always been a mystery to me, a combination of art and science allowing the airline to make as much money as possible on each flight while providing the customer the options and flexibility they want. Under the covers I know there are complex models the airlines use to determine how many seats have been sold and how much they can get for the remaining seats. I didn’t realize how seriously complex the models were but more importantly, the opportunity available to the travel industry to become more customer-centric while staying competitive by harnessing the data now available to them.…

By now, you’re probably well aware of what Hadoop does:  low-cost processing of huge amounts of data. But more importantly, what can Hadoop do for you?

We work with many customers across many industries with many different specific data challenges, but in talking to so many customers, we are also able to see patterns emerge on certain types of data and the value that could bring to a business.

We love to share these kinds of insights, so we built a series of video tutorials covering some of those scenarios:

Some more detailed discussion of these types of data is in our ‘Business Value of Hadoop’ whitepaper.…

The following is a guest post from Scott Gnau, President, Teradata Labs

I continue to be astonished by the evolution of Apache Hadoop, the software framework for large scale computing that has flourished thanks to a dynamic open source ecosystem. An army of contributors, including the smart engineers and contributors at Hortonworks, constantly refines Hadoop’s ability to manage massive amounts of data on computer clusters via MapReduce processing and the underlying Hadoop Distributed File System (HDFS).…