It sounds like the Wild West, and when it comes to data, sometimes it looks like it too. The concept of “wrangling” brings to mind a lone cowboy on horseback, rounding up a herd of cattle. Back in the day, a really good wrangler could take charge of livestock, guiding them in the right direction and organizing along the way.
We can make the same analogy for data wrangling. Rather than a lone cowboy, visualize a frantic business analyst, faced with disparate data sources from multiple organizations, and a need to organize, index, and query against a variety of elements. While cows may be fickle and needing encouragement, it seems at times the work we do in Microsoft Excel resembles the same effort as cajoling or cracking a whip!
You don’t have to look far before you find a compelling need for three things:
These requirements exist in all industries, but they are especially highlighted in the Consumer Product Goods (CPG) industry.
Pepsico is a leader in this industry, and like other CPG companies, has distinct challenges associated with managing supply chain and product demand. CPG organizations rely on special relationships with retailers to predict and manage this concept. This collaboration provides unique insight into the forecast and replenishment of standard goods. It can make a difference when planning “buy one get one” promotions to minimizing the risk of retailers having empty shelves when consumers arrive at the store to purchase the promoted item.
This process is called Collaborative Planning, Forecast, and Replenishment (CPFR), and requires data from all participants. CPG data outlining UPC details, shelf-life, and size provide details necessary to support shipping algorithms and what space is required on the trucks delivering product. Data from retailers contain, Store codes, quantity on hand, and Point of Sales (POS) data. Additional data like Weather, Events, and driver scores can be added as well to optimize delivery routes and manage issues in the supply chain.
The CPFR process is data-intensive. To be successful it requires a future-proof data platform to truly support all data. Whether it is structured data created by an application or data warehouse, or unstructured data collected from social media, syndicated sources, or other services. All data is combined to provide a unique view into this complex business process
In addition to the volume of this data being overwhelming, the process to manage it is as well. The sequence to combine multiple sources with different ID systems can be very manual and resource-intensive. At Strata in 2015, Matt Derda of Pepsico shared how they leveraged a series of Macros in Microsoft Access to convert customer data into Excel, which then fed a series of queries on their internal servers. Hours and Days were spent simply preparing the data. According to our partner Trifacta, over 80% of overall effort is spent preparing data vs. the true objective of analysis.
This complex business problem for CPFR was solved at Pepsico, who used Hortonworks Hadoop to store all collected data, Trifacta for a data wrangling solution, and Tableau delivering rich visualizations.
With this blend of technology, Pepsico gets faster access to reports, and truly supports the concept of “Collaborative” in their CPFR process.
They can easily and quickly import customer-provided data, combine it with internal product data, and enrich it with social media, sentiment analysis, and other unstructured data points. This combined data set is assembled very quickly, using intuitive and approachable scripting logic that provides visualization of data components, as well as potential data errors based on bad or missing characters.
A Consumer Goods company manages customers at two levels. Through their primary customer, or retailers, as well as the end-consumer. The end result, is that a CPG organization requires data at multiple levels to make business decisions.
At times, based on this CPFR process, a business decision may be to actually recommend reducing order volumes. If a retailer is ordering 500 cases per week and only consumes 100 cases per week, the right recommendation is a reduced order count in order to prevent spoilage at the end of the year. This is a byproduct of a truly collaborative relationship within the customer demand chain.
We are pleased to be hosting Pepsico, along with Trifacta and Tableau in a special webinar devoted to CPFR, Data Storage, Data Wrangling, and Visualization. Please join us next Wednesday and contact us to learn more about how Hortonworks and our partners can help you wrangle your data. You too can benefit from the visualization, data wrangling and a future-proof platform. It will equip you to be just like the lone cowboy – wrangling your data safely so you can ride off into the sunset! Register NOW!