Using Machine & Sensor Data in Hadoop… Have we digitized the world?
‘The world is being digitized’ proclaimed Geoffrey Moore in his keynote at Hadoop Summit 2012 over a year ago. His belief is that we are moving away from an analog society where we collect only casual recording of events to one that is digital, where everything is captured. It is our belief that Hadoop is one of the key technologies powering this shift to a digital society.
There is almost an expectation that we capture the pics, vids and conversations that run before us. Have you been to a 3rd grade play or concert lately? The glow of our mobile displays is ubiquitous as we compete to get the best shot of our kids. We capture it all. We have also grown accustom to capturing our continuous business data such as clickstreams and transactions … but there is more.
What impact and opportunity do we find here?
Today, we not only capture these life events but we store them for a longer period of time. Wouldn’t it be great to analyze clickstream over years of black Friday information? And not only are we capturing data for longer periods of time, we are also capturing all of it. A year ago, my friend John Kreisa used the word “exhaust data” when explaining these concepts and it has stuck with me. How much data have we thrown on the floor of our data centers over the past 30 or 40 years? It is possible that a nugget of value was hidden in it. Sensor data has been exhaust for years,
All of this data is interesting, but there is a world of microprocessors out there creating data today that might prove valuable. We can walk through a “day in the life” and see sensors everywhere. From the refrigerator that stores our half & half for the morning coffee, to the car we drive to work in, to the supply chain we may work on and the phones we have in our back pockets. Nearly every process and tool around us is creating data… massive amounts of it.
Making sense of the mountain
We now think of this data in terms of petabytes and zettabytes of captured information. The challenge we face is to formulate the “right” questions to ask of that information. Every set of data is different, but we can look at what others are doing to help find our own north star. Here are some examples of how others are using sensor data today:
- Predictive Analytics and Proactive Maintenance
The ability to predict equipment failure (and respond proactively) is extraordinarily valuable because it is far less expensive to do preventative maintenance than it is to pay for emergency repair or replacement equipment under duress. If a restaurant’s refrigerator fails, the franchise loses thousands of dollars in spoiled food and a day’s revenue. Fixed assets such as cellular transmission towers are difficult and expensive to replace, yet they exist to transmit data, so sensors can transmit diagnostic data that helps prolong the life of those assets. Algorithms can process massive amounts of sensor signals to identify previously invisible, subtle patterns indicating when an inexpensive repair is likely to prevent a costly replacement.
- Improving Research and Health
Since 2007, Children’s Hospital Los Angeles has collected sensor data from its pediatric intensive care units, sampled from each patient every 30 seconds. This dataset includes more than one billion individual measurements. Doctors plan to use this data to diagnose and predict medical episodes with greater precision. According to one of the researchers, the difficulty is to find medically useful patterns because “there are an infinite number of trivial patterns, such as people who tend to have babies are female and people over six-feet tall are over five-feet tall.”
There are hundreds, maybe thousands of uses of sensor data. My favorite example is the rail operator who has equipped rails with sensors that collect the sound frequencies as a train travels over a section of rail, looking for discrepancies to identify any potential issues within the system. Awesome.
How to get started?
With Hadoop, it is much easier to refine the data and explore it to find the meaningful patterns and exclude the trivial ones. As Hadoop extends the storage and analysis of big data to processes beyond commercial Internet use cases, it can augment and assist other efforts to even save lives through the process of prediction, identification and prevention.
The bigger question is what sensor data do you have and what questions can you ask of it? The technology is ready.