A few weeks back we posted a definition of “big data”. There was definitely some internal conversation about the term and if this definition had captured what the term means. Sum finding: it is a loaded term. It means a lot of different things to a lot of different people.
When I first joined Hortonworks, I bought in to the three V’s (volume velocity and variety) definition of big data. It works for the most part, but is more a descriptor of the data. It explains the characteristics of the data. The definition is cold and lacks soul. Afterall, “big data” represents promise of “big” business value.
A “Value” Definition of Big Data
I gravitate to this because it outlines WHAT the data is, not just the characteristics. It points to areas that we should focus on as businesses. It lends to the value a bit more. Each of the three components are important.
- Transactions are pretty simple to understand. This is our ERP data. It is the data that we maintain and track in our OLTP systems. It can be any record of any system-to-system or human-to-system interaction. It can even be a human-to-human interaction as long as it is captured electronically. We use a lot of this data in our analytics today.
- Interactions are the points in time we relate with a system. It could be a tweet or a facebook post. It could be an electronic or paper customer satisfaction survey. Interactions are web logs and A/B tests. We have a lot of this data but typically no efficient way to understand or extract value from it.
- Observations are interesting because they represent a world of net new data sources that we once never thought of analyzing. It is data that was once thought of as low to medium value data or even exhaust data that was too bulky and just too expensive to store. This can be machine-generated data from sensors or web logs and clickstreams or even audio/video or largely unstructured content. Typically, we never even thought of this data before.
The Intersection Is Where Things Get Interesting
This “value” definition of big data gets interesting when you substitute the plus signs in Shaun’s definition with intersections…
Big Data = Transactions ∩ Interactions ∩ Observations.
With big data technology (one of these being Apache Hadoop) we can now efficiently store and process all of this data. We can refine observation data down to the salient details that may be interesting in the context of our EDW. But even more interesting we can ask these big data systems new questions. We can combine data across all these types and come up with new value for organizations. There is a world of data in our organizations that are used for an explicit purpose. When we start to combine things, the big data world gets really interesting.
If you’re using Hadoop to create value from your big data, why not check out our Hadoop Patterns of Use whitepaper and see how it can work for you.