Hadoop adoption is key to understanding big data

The Hadoop explosion continues, with the number of organizations adopting the platform growing at a compound annual growth rate of about 60 percent, according to the International Data Corporation. However, ReadWrite's Matt Asay wrote that many of the companies adopting it are still acclimating to Hadoop and aren't using it at its optimal capacity. Currently, most organizations that use Hadoop take advantage of its storage and ETL (extract, transform, load) features, but aren't taking the crucial steps for optimizing big data analysis.

"The fact that most enterprises have yet to get to analytics in any meaningful way is simply a description of where we are in the Hadoop market's evolution," Asay suggested.

Part of the lag time, Asay asserted, is that its many components can make some users unwilling to spend time discovering what it has to offer. A recent report by CIO Insight illustrated the stratification of Hadoop users. Based on a survey of 107 data professionals, the report found that 68 percent used Hive, 57 percent employed MapReduce, 34 percent used Pig and 15 percent utilized Native SQL. These discrepancies, said research analyst Matt Aslett at Hadoop Summit, can be ironed out with more concentration on the enlightened process of Hadoop.

"Attempting to fast forward to analytics, missing out on the processing/integration stage, creates silos and will result in disillusionment," he observed.

Taking the steps toward optimizing Hadoop
High-functioning Hadoop, wrote Aslett, makes the most effective use of the big three of big data: volume, velocity and variety. Ultizer's Jonathan Gershater wrote that Apache Hadoop's open source approach directly impacts the efficacy of the big three to work for an enterprise. Additionally, the growing impact of Hadoop software decreases a firm's costly reliance on hardware.

"Because Hadoop is distributing the processing task, it can take advantage of cheap commodity hardware – compare this to processing all the data centrally on big expensive hardware," wrote Gershater.

Eventually, wrote Aslett, the big three become the singular concern: the totality of big data. Total data is the smart form of big data, a strategy of examining big data from all angles so that everything is seen from different perspectives and nothing falls through the cracks. Optimized Apache Hadoop allows big data to be stored, processed and integrated most effectively. Aslett employed the idea of an ecosystem created by Hadoop, a complex distribution of information and resources that accounts for influences from and is influenced by other factors.

"The Hadoop ecosystem is vibrant, with strength in depth and breadth," Aslett wrote.

Categorized by :

Leave a Reply

Your email address will not be published. Required fields are marked *

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.