cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
December 21, 2015
prev slideNext slide

A Very Hadoopy Christmas

Santa will be busy this year. On December 24th he’s scheduled to deliver presents to billions of children globally. Buddy and the Keeblers will be working overtime to meet the demand, and Santa has called in temp work from Legolas and Dobby.

There’s little doubt that Santa is a master of lean manufacturing, but there’s only so much muda you can cut from the factory floor. After all, his supply chain has been perfected over decades and his workforce is loyal and perfectly aligned with the mission.

But Santa’s market is changing, and big change requires Big Data. Santa Claus is well behind the curve here. What if we could create efficiencies that could save Santa time and money?

A Claus For Concern

Let’s start with the list Santa checks twice. He relies on paper to scribble the names of the naughty and the nice. But even if he cuts out middle names and wrote at a 12pt font, then at 500 words per page or 250 names, Santa has expended 7.6 million sheets of paper, which could mean as many as 760 downed trees.

Clearly, Santa needs to go digital or end up on the EPA’s naughty list.

But paper isn’t the only thing Santa is at risk of wasting. Each year’s Naughty or Nice List is a valuable historical record of prior behavior and gift preferences. Without easily retrievable Big Data, mistakes happen. Santa might deliver a second consecutive PlayStation or bicycle to the same nice child—causing confusion and possibly casting doubt on the big man’s omniscience. Even with an error rate of .0001%, that’s more than two thousand kids who might receive the same gift as last year.

What about the presents given to undeserving children? No one talks about this, but we’ve all seen it happen. Frankly, Santa’s methodology for allocating resources is archaic at best.

Qualitative analysis of our children should be real-time and involve more than parental interviews or report cards. Recent analysis of actual child behavior by a Big Three consulting firm revealed a 15% error rate on the Naughty or Nice List. Some children’s designation changed 23 times in the same calendar year, and designation changes were most likely to occur in December.

For example, last year a child broke his 9:00pm bedtime on December 23rd. But due to data processing latencies, Santa didn’t know about it until the 27th – four days after this unworthy child received his dream gift.

Regarding the quality of analysis, Santa must ingest different kinds of data for advanced sentiment analysis that goes beyond self-reported behavior. How about cellphone metadata? Social media accounts? Even the brightest students can harbor dark secrets:

  • Little Angela sends text messages during class
  • Little Timmy spends his nights trolling on Twitter
  • Little Sophie leaked Star Wars spoilers on Reddit
  • Little Johnny’s less-than-constructive YouTube comments

Santa’s list must go digital, and both Apache™ Hadoop® and Apache NiFi will help him modernize his data architecture.

Santa needs advanced data ingest capabilities. He needs new Data Discovery tools, a Single View of each child’s behavioral patterns, and even Predictive Analytics to make sure that he can respond nimbly to last-minute updates to wish lists.

Coal: The Hottest Toy This Christmas

Making a list and checking it twice is fine, if you’re shopping for groceries. But Santa has to determine whether 1.9 billion children are naughty or nice. If that took him 5 seconds per child, his list would take over 300 uninterrupted years to complete.

That’s where Open Enterprise Hadoop helps with Santa’s pre-Christmas data discovery. Through Hadoop, Santa can ingest qualitative data of the world’s children in both structured and unstructured formats. Everything from report cards to social media accounts to cell phone metadata.

Once this data is centralized into a Hadoop cluster, Santa can rapidly process this list and distinguish the naughty children from the nice. This Single View of data enables the clearest analytical picture by combining siloed column-and-row data and enriching it with newer, less structured types of data.

If you’ve waited in line at a retailer lately, or sat in a movie theater, you know that not every child deserves a very merry Christmas. Some deserve coal.

Through the power of Hadoop, these children can be quickly identified, and advanced data science algorithms can update Santa’s list in real time—even as his sleigh speeds through the frosty December night.

The Gift That Keeps on Giving

Most don’t know this, but even Santa has a budget. (Thank you Switzerland.)

Costs matter, and Hadoop substantially reduces long-term storage costs. This means that Santa can retain data for many years and access this data whenever required.

Let’s consider the households that believe it’s better to receive than to give. I’m not saying Santa should play favorites, but his gifts are a finite resource. If all he’s getting in return are gluten-free biscuits and soy milk, Santa needs to prioritize accordingly. Now he can have the data to do so.

Terabytes of historical data on everything from flight times, to cookie types, to chimney blockages—all of this can help keep Santa jolly as he maximizes merriness around the world. All of this data is important, and so it must not be forgotten.

And to All a Good Night

Santa’s a busy guy, but this December he should be evaluating both Hortonworks Data Platform and Hortonworks DataFlow—looking for ways that these Connected Data Platforms can help him do his job faster, better, and for cheaper.

By starting his journey to Hadoop, Saint Nick can save time on Christmas Eve for what really matters: an eggnog martini by a roaring fireplace with Mrs. Claus.

Tags:

Leave a Reply

Your email address will not be published. Required fields are marked *