Ease into Hadoop implementation by using it for data warehouses

Enthusiasm for big data in principle is high, but, as enterprises increasingly begin to experiment with Hadoop clusters, many are finding that they are unsure exactly how this new technology fits into their organization. In a recent blog post, Gartner analyst Svetlana Sicular observed that users are eager to implement Hadoop but do not know where to start.

She suggested that the technology is already reaching the "Trough of Disillusionment" stage in the research firm's Hype Cycle. Some clients are unsure where to begin their projects, while even those with established data programs are worried that they are falling behind competitors. Some of these lack clarity in how to tackle the problems they want to solve or even what questions to ask. Sicular advised organizations not to despair.

"Formulating a right question is always hard, but with big data, it is an order of magnitude harder, because you are blazing the trail (not grazing on the green field)," Sicular noted.

An initial Hadoop project
According to data warehousing expert Rob Klopp, one low-risk entry point for organizations looking to use Hadoop but unsure of where it fits into their operations is as a staging area for a data warehouse. In many enterprises, only data that will be part of a user query is moved to a data warehouse, and the more detailed raw information is left largely untouched. The staging area serves as a sort of raw data warehouse, holding onto this information without aggregating it.

Since Hadoop runs on inexpensive hardware and software, it is ideal for holding this type of raw data, Klopp suggested. By using Hadoop for a staging area, enterprises also position themselves for later analytics, transformation and aggregation jobs that might use this raw data.

"The economics of Hadoop make it the likely repository for big data," Klopp wrote. "Using Hadoop as the staging area for your data warehouse data might provide a low risk way to get started with Hadoop… with an ROI… preparing your staff for other Hadoop things to come."

Categorized by :

Leave a Reply

Your email address will not be published. Required fields are marked *

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.