Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
February 25, 2014
prev slideNext slide

Modern Advertising Architectures Built with Hadoop

This is the fifth in our series on modern data architectures across industry verticals. Others in the series are:

Consumers have never generated so much data on how they research, discuss and buy products. This new data is valuable for shaping and promoting a brand or product, but it doesn’t line up neatly to fit in pre-defined, tabular formats.

Apache Hadoop, and Hortonworks Data Platform, brings this “new” data under analysis, by ingesting social media, clickstream, video and transaction data without requiring a pre-defined data schema.

Now media companies, agencies and enterprises can store new types of data, merge it with existing data and retain everything for longer.

This improves advertising for customer acquisition and loyalty, for better returns on advertising investment.

Advertisers do Hadoop.

The following reference architecture diagram represents a combination of approaches that we see our advertising and media customers adopt, whether they advertise groceries, home improvement programming, kids toys, or anything and everything on a retail website.


Here are some specific ways that our media and advertising customers use HDP to improve their bottom lines.

Mine Grocery & Drug Store POS Data to Identify High-Value Shoppers

One marketing analytics company specializes in gathering insight at the checkout counter, across many grocers and drug stores. They mine this sales information for basket analysis, price sensitivity, and demand forecasts.

Interactive query with the Stinger Initiative and Apache Hive running on YARN help the company rapidly process terabytes of data to keep pace with a market that changes by the day.

Target Ads to Customers in Specific Cultural or Linguistic Segments

Hortonworks’ customer Luminar is the leading big data analytics and modeling provider uniquely focused on delivering actionable advertising insights on U.S. Latino consumers.

Luminar wanted to move beyond samples of Latinos living in the United States and towards empirical analysis of actual data on all US Latinos. They wanted to acquire and save as many transactions as possible from as many different sources as possible.

Now HDP interacts easily with other components of Luminar’s data and business intelligence ecosystem: Amazon Cloud, R, Talend and Tableau. The company has increased ingest of transaction data from 300 sources to 2000, up from 2 to 15 terabytes per month. Before, it took Luminar days to ingest and join a new set of raw data, now it takes only hours, even with eight times more data than before.

Luminar uses that insight to craft marketing strategies for CPG and entertainment companies that want to focus on the US Latino population.

Syndicate Videos According to Behavior, Demographics & Channel

A major omni-media company specializes in home improvement and DIY content distributed across television, digital, mobile and publishing channels. One of its divisions focused on delivering online video ads.

Both content syndicators and publishers want to make sure that video content reaches the right audience. The company analyzes clickstream data stored on HDP for audience analysis that feeds a recommendation engine for improved ad consumption.

ETL Toy Market Research Data for Longer Retention & Deeper Insight

A leading consumer research firm provides consumer intelligence to the toy industry. The company delivers weekly point-of-sale (POS) tracking information for competitive insight on toy sales trends. They cover all the major toy retailers, for a complete view of the marketplace.

The company chose HDP to offload much of its data from a more expensive platform, with expected savings of more than $1 million annually. The improved economics allow the company to retain data longer and identify long-range, strategic opportunities for growth.

This helps its toy company clients partner more closely with retailers.

Optimize Online Ad Placement for Retail Websites

One of our customers provides web analytics services to some of the world’s largest retail websites. For their largest customer, clickstream data pours in at the rate of hundreds of megabytes per hour, which adds up to billions of rows per month.

The agency analyzes each ad’s placement and determines click-through and conversion rates. When impression files and click files were stored in a relational database, the agency had no way to intelligently connect impressions to clicks. So they had to guess.

Now HDP replaces that guess work with empirical science and confident analysis by week, by day or by hour. The agency can also filter by the consumer’s OS, browser, device and geographical location. With Hadoop’s economies of scale, data storage costs are significantly lower than before, and data can be retained for longer. So the agency and its customers all look forward to looking back on years (not weeks) of clickstream data.

The agency’s retail customers can now tell if consumers are clicking on their website while standing in one of their stores. This provides valuable insight to manage “showrooming” behavior where customers visit a store to touch a product and then drive home to buy it online. Retailers can address showrooming without slashing prices, and data in HDP reveals specific tactics for doing so.

Watch our blog in the coming weeks for reference architectures in other industry verticals.


Leave a Reply

Your email address will not be published. Required fields are marked *