Apache Hadoop and Data Agility

In a recent blog post I mentioned the 4 reasons for using Hadoop for data science. In this blog post I would like to dive deeper into the last of these reasons: data agility.

In most existing data architectures, based on relational database systems, the data schema is of central importance, and needs to be designed and maintained carefully over the lifetime of the project. Furthermore, whatever data fits into the schema will be stored, and everything else typically gets ignored and lost. Changing the schema is a significant undertaking, one that most IT organizations don’t take lightly. In fact, it is not uncommon for a schema change in an operational RDBMS system to take 6-12 months if not more.

Hadoop is different. A schema is not needed when you write data; instead the schema is applied when using the data for some application, thus the concept of “schema on read”.

With Hadoop, storing a new type of data is as simple as creating a new folder and pushing the new data files into that folder. It doesn’t require an IT project to redesign the schema and upgrade production systems with that new schema.

Teams developing data products using Hadoop benefit from much shorter development cycles, and are able to test 5-10x more hypotheses in a given time-frame. Very quickly, people notice the shorter innovation cycles, and more teams start using Hadoop to gain the same benefit.

By the way, although Hadoop is mostly used to store and process really big datasets (aka big data), this benefit of data agility is true for any dataset stored on Hadoop, big or small.

Categorized by :
Apache Hadoop Big Data

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.
Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.

Thank you for subscribing!