Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
September 30, 2014
prev slideNext slide

Webinar Recap “Retail Insights: What’s Possible With a Modern Data Architecture”

Last week’s Hortonworks webinar “What’s Possible with a Modern Data Architecture?” featured Greg Girard, program director for omni-channel analytics strategies at IDC Retail Insights and Mark Ledbetter, vice president for industry solutions at Hortonworks. Greg provides targeted, fact-based guidance to retailers for the application of analytics across the enterprise. Mark has more than twenty-five years experience in the software industry with a focus on retail and supply chains.

Many of Greg and Mark’s thoughts from the webinar echo topics also covered in the recent Hortonworks white paper “The Retail Sector Boosts Sales with Hadoop.”

Download White Paper

Greg discussed the most significant drivers of big data initiatives in the retail industry, including customer acquisition, pricing strategies or competitive intelligence. Mark then explained how a modern data architecture with Hortonworks Data Platform helps high achieving retailers use data to build sustained competitive advantages.

Here is the complete recording of the Webinar.

Here is Greg’s presentation, and this is Mark’s presentation.

We’re grateful to the many participants who joined this webinar and asked excellent questions. Here’s the Q & A from the webinar (including some questions that we did not have time to answer on the call):

Question Answer
How are all the issues of data quality and compatibility handled by Hadoop across an organization?

There are several technologies available for data ingestion into the Hadoop environment.

Much of the ETL processing is now being moved over into the Hadoop environment, because it can be done as fast as and cheaper than traditional methods.

Specific applications within Hortonworks Data Platform help with date integration and governance:

I get the sense that a Hadoop implementation is intended to be an application-specific solution, rather than a general use solution like a traditional data warehouse. Can you comment?

Actually, the important thing to understand is that a Hadoop data lake in a retail environment is an opportunity to bring together all of the data sources that you have not been able to combine in the past. Not only can you bring your traditional operational reporting into Hadoop, but you can also link that to non-traditional sources of data. That may be sales data, tied to customer, tied to clickstream, tied to social media data. Then your data scientists can look for detailed trends and insights that you haven’t been able to get before.

Now that we can get away from the cubed and aggregated data that we’ve had in traditional environments, we can go looking for new insight (without having to guess the answer to the question before we start the interrogation.)

How are retail organizations balancing the use of Hadoop with their operational reporting solutions?

It’s not an “or”, it’s an “and”. Hadoop can be purposed for operational, batch reporting as well as ad hoc, interactive investigation.

Research by IDC Retail Insights found that it’s imperative to balance your acquisition of data scientists and business analysts. The key thing is to give both of those roles access to the tools of their trade.

Unfortunately, both of those skill sets is in short supply, and a Hadoop strategy can be a compelling way to attract that talent into your retail organization.

There needs to be a balance of tool development between structured data reporting and explorative analytics. We can append existing EDW systems with Hadoop to provide a full range of insight.

Please explain how Hortonworks approaches data security (for example with PCI standards).

The Hortonworks Data Platform provides a rich data storage environment that scales linearly and can accommodate extremely large and diverse volumes of data. This data is stored in a fault-tolerant and robust fashion that gives organizations the peace of mind required for management of large datasets.

PCI standards are associated with protecting consumer identity and payment card information, and are specifically designed to protect financial data and consumer information. Customers using Hortonworks Data Platform are compliant with PCI standards through specific configuration of the platform, tokenization of sensitive data, and partnership with encryption partners of Hortonworks.

Security is woven throughout Hortonworks Data Platform, as part of our approach to delivering open-source development projects for the enterprise. You can find full details on our approach on our Hortonworks security page.

Is the data lake a replacement for the enterprise data warehouse or is it a complementary structure?

Hortonworks was founded with the fundamental belief that Apache Hadoop should be a core component of a modern data architecture, integrating with and complementing existing EDW systems.

According to that strategy, Hortonworks partner programs are designed to expand, support and accelerate the growth of a vibrant Apache Hadoop ecosystem by providing technical enablement, joint marketing opportunities, design assistance, technical support and training.

Low-value computing tasks such as ETL – which can consume significant EDW resources – can be offloaded to Hadoop and performed much more cost efficiently, freeing up the data warehouse to perform the truly high-value functions, such as analytics and operations, that best leverage its advanced capabilities.

Hortonworks is focused on the deep integration of Hadoop with your existing data center technologies and team capabilities. To achieve this, Hortonworks built strategic co-engineering and reselling partnerships with leaders in the EDW including: HP, Microsoft, Rackspace, SAP and Teradata.

Can you elaborate on the Call Center Productivity use case? Call centers typically handle customer support, upsell, and cross-sell. Mark described how Hadoop data can recommend the next best product in an upsell situation. By storing data in HDFS from multiple disparate sources (consumer profiles, demographic information, and order history) it is possible predict the next best product for the customer. The call center agent can deliver those recommendations via a real-time script or in an email response to the customer.
Are any retail clients using a combination of iBeacon with Hadoop? Yes. Hortonworks does have retail customers using iBeacon with Hadoop. A particular large retailer leverages iBeacon to capture behavior of shoppers who have installed the app on their phones. Data streams into HDFS to indicate how customers move through the store, relative to product promotions. Historical data across all shoppers provides insight on store design and product placement.
How can Apache Solr be used for search and metadata management in a retail environment?

Apache Solr is specifically designed to provide full-text search and real-time indexing. It is optimized for high-volume web traffic. The natural first choices for application in a retail environment would be to analyze clickstreams in online shopping behavior or to analyze sentiment of shoppers surfing an online catalog.

HDP 2.1 includes the true open-source Apache Solr 4.7.2 for Hadoop Search.

Additional Resources for Retailers that Want to Do Hadoop


Leave a Reply

Your email address will not be published. Required fields are marked *