Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
February 11, 2014
prev slideNext slide

Modern Retail Architectures Built with Hadoop

This is the fourth in our series on modern data architectures across industry verticals. Others in the series are:

We’ve probably all heard the famous quote by John Wanamaker, the father of modern advertising: “Half the money I spend on advertising is wasted; the trouble is, I don’t know which half.”

Wanamaker would love Apache Hadoop for retail applications, because it diminishes (or eliminates) the dilemma he described.

When Hadoop is integrated with modern retail operations, it dramatically reduces the cost of capturing, ingesting, storing and analyzing data.

Now Wanamaker wannabes can analyze enough data to make statistically confident observations on empirical retail data, rather than rolling the dice with customer panels, in-store surveys or focus groups to guess what drives sales.

The following reference architecture diagram represents a combination of approaches that we see our retail customers adopt, whether they sell automobiles, ladders, shirts or shoes.


With their Hadoop modern data architectures, retail companies of all sorts can execute use cases like the five following. These are five of the most common ways that retailers do Hadoop.

Build a 360° View of the Customer

Retailers interact with customers across multiple channels, yet customer interaction and purchase data is often isolated in data siloes. Few retailers can accurately correlate eventual customer purchases with marketing campaigns and online browsing behavior.

Apache Hadoop gives retailers a 360° view of customer behavior. It lets them store data longer, join it with other data sets, and identify phases of the customer lifecycle. Better customer analytics helps to increase sales, reduce inventory expenses and retain the best customers.

Analyze Brand Sentiment

Enterprises lack a reliable way to track their brand health. It is difficult to analyze how advertising, competitor moves, product launches or news stories affect the brand. Internal brand studies can be slow, expensive and flawed.

Apache Hadoop enables quick, unbiased snapshots of brand opinions expressed in social media. Retailers can analyze sentiment in real-time from Twitter, Facebook, LinkedIn or industry-specific social media streams. With better understanding of customer perceptions, they can align their communications, products and promotions.

Localize & Personalize Promotions

Retailers that can geo-locate their mobile subscribers can deliver localized and personalized promotions. This requires connections with both historical and real-time streaming data.

Apache Hadoop brings the data together to inexpensively localize and personalize promotions delivered to mobile devices. Retailers can develop mobile apps to notify customers about local events and sales that align with their preferences and geographic location (even down to a particular section in a specific store).

In time for the 2013 Holiday shopping season, Macy’s launched a test in two flagship stores with Apple’s iBeacons technology. This article describes how, “down the road, Macy’s might also ping shoppers on a department-by-department basis, possibly telling them about sneaker sales when they’re in the shoe section, or even recommending nearby products.”

Optimize Websites

Online shoppers leave billions of clickstream data trails. Clickstream data can tell web retailers the web pages customers visit and what they buy (or what they don’t buy). But at scale, the huge volume of unstructured weblogs is difficult to ingest, store, refine and analyze for insight. Relational databases are not suited to store this clickstream data.

Apache Hadoop can store all web logs, for years, at a low cost. Web retailers use information in that data to understand user paths, do basket analysis, run A/B tests and prioritize site updates. This improves online conversion and revenue.

Redesign Store Layouts

In-store layout and product placement affect sales. Retailers often hire extraneous staff to make up for a sub-optimal store layout (e.g. “Are you finding what you need?”). Brick-and-mortar stores lack “pre-cash register” data about what in-store shoppers do before they buy.

In-store sensors, RFID tags & QR codes can fill that data gap, but they generate a lot of data. Apache Hadoop can store that huge volume of unstructured sensor and location data. Once analyzed, the resulting intelligence allows retailers to reduce costs and simultaneously improve customer in-store satisfaction. This improves same-store sales and customer loyalty.

Watch our blog in the coming weeks for reference architectures in other industry verticals.


Leave a Reply

Your email address will not be published. Required fields are marked *