Where is Hadoop headed?

The early years of Hadoop development focused on creating a platform that could handle analytics at a large scale, but the coming years will more likely focus on refining the environment to work at faster speeds, according to panelists at the recent Structure: Data 2013 conference. Reporting from several panels at the event, GigaOM noted that running queries on large data sets is now a manageable process supported by a wealth of applications. Developers are now setting their sights on introducing more interactivity to the Hadoop ecosystem.

Hadoop needs to move in the direction of offering fast, predictive capabilities so that it can fulfill the same role as a feature like Google's "I'm Feeling Lucky" search function, Omer Trajman of analysis applications company WibiData suggested. Users should be able to plug in queries and get smart, dynamic responses. For this to happen, companies will need "Hadoop high throughput, low latency," analytics executive Muddu Sudhakar said.

Reaching these kinds of speeds will be challenging, but having more interactivity will unlock new uses for Hadoop and big data, experts suggested. Silvius Rus, director of big data platforms for Quantcast, explained that businesses need to be able to quickly test ideas and get answers back in minutes instead of days. For instance, companies should be able to immediately determine customer service issues and respond to problems, Ashok Srivastava, chief data scientist at Verizon, said. Additionally, big data analysis should be able to help with research in cybersecurity and certain scientific fields by crowdsourcing information.

"Imagine taking your cell phone pictures and combining them with multiple millions of other cell phone pictures," he said, according to GigaOM. "That's something that can be used by scientists."

To get this type of real-time feedback, the Hadoop community will likely have to continue to develop the functionality of tools such as Hive and Pig, but applications that nudge the world toward "big data utopia" – in the words of one panel moderator – could be coming within months, Srivastava said. Coupling speed and scale, the potential for insights using Hadoop continues to grow.

Categorized by :

Leave a Reply

Your email address will not be published. Required fields are marked *

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.