Apache Hadoop and Data Agility

In a recent blog post I mentioned the 4 reasons for using Hadoop for data science. In this blog post I would like to dive deeper into the last of these reasons: data agility.

In most existing data architectures, based on relational database systems, the data schema is of central importance, and needs to be designed and maintained carefully over the lifetime of the project. Furthermore, whatever data fits into the schema will be stored, and everything else typically gets ignored and lost. Changing the schema is a significant undertaking, one that most IT organizations don’t take lightly. In fact, it is not uncommon for a schema change in an operational RDBMS system to take 6-12 months if not more.

Hadoop is different. A schema is not needed when you write data; instead the schema is applied when using the data for some application, thus the concept of “schema on read”.

With Hadoop, storing a new type of data is as simple as creating a new folder and pushing the new data files into that folder. It doesn’t require an IT project to redesign the schema and upgrade production systems with that new schema.

Teams developing data products using Hadoop benefit from much shorter development cycles, and are able to test 5-10x more hypotheses in a given time-frame. Very quickly, people notice the shorter innovation cycles, and more teams start using Hadoop to gain the same benefit.

By the way, although Hadoop is mostly used to store and process really big datasets (aka big data), this benefit of data agility is true for any dataset stored on Hadoop, big or small.

Categorized by :
Apache Hadoop Big Data

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Join the Webinar!

Big Data Virtual Meetup Chennai
Wednesday, October 29, 2014
9:00 pm India Time / 8:30 am Pacific Time / 4:30 pm Europe Time (Paris)

More Webinars »

HDP 2.1 Webinar Series
Join us for a series of talks on some of the new enterprise functionality available in HDP 2.1 including data governance, security, operations and data access :
Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.
Integrate with existing systems
Hortonworks maintains and works with an extensive partner ecosystem from broad enterprise platform vendors to specialized solutions and systems integrators.