On August 4th at 10:00 am PST, Eric Thorsen, General Manager Retail/CP at Hortonworks and Krishnan Parasuraman, VP Business Development at Splice Machine, will be talking about how Hadoop can be leveraged as a scale-out relational database to be the System of Record and power mission critical applications.
In this blog, they provide answers to some of the most frequently asked questions they have heard on the topic.
Although Hadoop’s heritage and initial success were in batch based applications and analytic workloads, today, the platform has evolved to support real-time, highly interactive applications. The introduction of HBase into the ecosystem enabled real-time, incremental writes on top of the immutable Hadoop file system. With Splice Machine, companies can now support ACID transactions on top of data resident in HBase. As a full-featured Hadoop RDBMS that supports ANSI standard SQL, secondary indexes, constraints, complex joins and highly concurrent transactions, Splice Machine database and the Hortonworks data platform enable enterprises to power real-time OLTP applications and analytics, especially as they approach big data scale.
With increasing number of channels and customer interactions across each one of them, retailers are looking for opportunities to better harness this data to drive real time decision-making – be it in personalizing their marketing activities and delivering targeted campaigns, or optimizing their assortment and merchandising decisions, or improving the efficiency of their supply chains.
A retail enterprise has multiple data repositories that require RDBMS capabilities but, at the same time, is challenged with the need for scaling those. For example, a Demand Signal Repository is a common System of Record that houses point of sale data, inventory information, forecasts, promotions and shipments. This data needs to be harmonized and maintained in a consistent state. It needs to support operational reporting such as stock-outs and also complex analytics such as forecasts. We also hear from those enterprises that their existing traditional databases such as Oracle, SQL Server or DB2 that house this data are unable to scale beyond a few terabytes and become too cumbersome to maintain. This clearly spells out the need for a data platform that can scale effortlessly to manage massive volumes of data and, at the same time, provide RDBMS capabilities that has feature function parity with their existing systems.
In retail, there are various processes that encompass both transactional and analytical workloads. For example, a campaign management system needs to ingest real-time customer data from multiple sources and potentially deliver personalized messages to those individuals. This is a highly transactional process with customer profile lookups and real-time updates. It requires concurrent system access that can scale effortlessly, especially during peak shopping seasons. That same system also needs to be able to run fairly complex analytics such as audience segmentation, look-alike modeling and offer optimization.
Retailers typically run the transactional process via a campaign management or CRM application on top of a traditional database such as Oracle or SQL Server and run their analytic processing on a different data warehouse or an MPP data mart. They had to maintain separate databases for these two different workloads and move data back and forth. With the Hadoop RDMS, they can run both the transactional (OLTP) and analytic workload (OLAP) on the same data platform, eliminating the need to duplicate data and deal with ETL bottlenecks. This also enables their entire process to scale-up affordably with increasing data volumes.
A good example is Harte Hanks. They are replacing their Oracle RAC database powering their campaign management solution with Splice Machine Hadoop RDBMS. Harte Hanks is a global marketing services provider and serves some of the largest retailers in the market. They provide a 360 degree view of the customer thru a customer relationship management system and enable cross channel campaign analytics with real-time and mobile access. Their biggest challenge was that their customer queries were getting slower, in some cases over a half hour to complete. Expecting 30-50% future data growth, Harte Hanks was concerned that database performance issues would become increasingly worse. Harte Hanks evaluated whether to continue scaling up to larger and more expensive proprietary servers or to seek solutions that can affordably scale-out on commodity hardware. Splice Machine and Hadoop now support Harte Hank’s mixed workload applications (OLAP and OLTP). They have been able to gain a 75% cost saving with a 3-7x increase in query speeds.
Overall, they have experienced a 10-20x improvement in price/performance without significant application, BI or ETL rewrites.