Get Started with Hadoop: Develop, Analyze and Operate

getstartedThere’s an old proverb you’ve likely heard about blind men trying to identify an elephant. Depending on the version of the proverb you’ve heard the elephant is misidentified variously as rope, walls, pillars, baskets, brushes and more. Oddly, no-one identified it as a next-generation enterprise data platform but I guess it is an old proverb.

The Hadoop elephant is a platform though, and as such the proverb holds true. Depending on your perspective, it has different capabilities, components and integration points to meet your requirements. To that end, we’ve reorganized some of our technical content around 3 groups of activities, each of which also has specific needs. Of course, you’ll recognize these needs and roles from your existing teams:

DEVELOP. Developers can use Hadoop in multiple ways. It may be that you are building alongside Hadoop to collect data: moving it around from a point of creation to the cluster for analysis. Or perhaps you’re processing and refining that data for improved analysis. Or maybe you’re considering building next-generation apps atop Hadoop, or taking advantage of the derived insights from those teams.

ANALYZE. Data in Hadoop can be vast and various and so data analysts may be exploring as data scientists utilizing analytics techniques, or operationalizing queries over known data for repeated use, and then delivering those insights in ways that a business or end user can act upon.

OPERATE. Crucially, Hadoop operates within a modern data architecture and so administrators need to be able to provision, manage and monitor clusters that integrate and interoperate with existing components of the data architecture.

Sound familiar? Of course it does. Getting started with Hadoop is about taking your existing skills and tools and applying them to data at a new scale. Enjoy.

Categorized by :

Leave a Reply

Your email address will not be published. Required fields are marked *

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.