Apache Hadoop hits its stride

Nearly every technological advancement had a somewhat rocky beginning, when user rates were low and early adopters struggled to optimize its benefits without a clear roadmap to success. Apache Hadoop was not immune to this process. InformationWeek's Jeff Bertolucci recently examined the history of Hadoop, noting that although the platform showed unquestionable promise in its early days, predicting that it would enjoy the level of success and wide user base that it has accumulated since then would have been a risky endeavor. Today, Hadoop stands as the benchmark for big data design, leveraged by Fortune 500 companies that custom build their analytics ventures. Hortonworks vice president John Kreisa concurred that the Apache Hadoop platform has quickly moved from something of a curiosity to a full-blown market leader.

"I've been working with the technology for three or four years now, and over that time Hadoop has gone from the experimental, 'We've got a test cluster set up,' to 'OK, here's what we're going to do with it,'" Kreisa told the source. He continued, "Effectively, Hadoop has matured now as a technology such that mainstream enterprises are using it for a wide variety of workloads."

The need for greater expertise
Kreisa noted that one of the barriers to wider adoption levels of the platform was the need for more data scientists with Hadoop training. However, the dearth of data analytics experts has been well documented across the big data market and is not a Hadoop-specific issue. Dell Software executive Guy Harrison outlined the many skills that a Hadoop-focused data scientist should have in a post he wrote for VentureBeat. The three main attributes identified were statistics, parallel programming and algorithmic abilities. Acquiring employees who meet this criteria can significantly improve a business' Apache Hadoop project.

Despite some of these challenges, Harrison explained that there were significant benefits to the Hadoop framework. For example, Hadoop clusters are capable of storing a massive amount of data, which is necessary to generate the most accurate results from a given big data project. In addition, this information is replicated across multiple nodes, providing a data backup option in the event that a research team experiences a system failure. 

Accommodating multiple forms of data
Harrison also noted that the Apache Hadoop architecture is able to accomplish a feat that traditional databases are ill-equipped to take on: storing and processing unstructured data. A standard database can only tackle data that is in the form of a structured schema. This means that those users will lose any value that video or audio files could add to an analytics project, for example. However, with Apache Hadoop, researchers can leverage many different forms of information to obtain accurate and actionable insights.

Categorized by :

Leave a Reply

Your email address will not be published. Required fields are marked *

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.