While Schremser was familiar with the work happening around big data platforms like Apache Hadoop, he initially feared that a big data platform would be out of the company’s reach.
“Much of the local talent base here in Louisville comes from large companies enterprise organizations. As a result, many of these individuals specialize in Windows and .NET, said Schremser. “The typical crop of big data technologies, which rely on Linux, Java and even more ‘exotic’ languages like Scala, would have been a show stopper for us. We couldn’t move fast enough if we had to build on unfamiliar technologies from the OS up.”
After learning about Microsoft and Hortonworks Data Platform (HDP) for Windows at the Microsoft TechEd conference, Schremser realized it offered just what he needed: The scalability and commodity economics of Hadoop running on a technology platform that his team knew and trusted.
To prove out his hunch, Schremser kicked off a proof-of-concept. Members of the data team at ZirMed quickly deployed a small Hadoop cluster running HDP for Windows 1.3 on 8 modestly configured machines—2 master nodes and 6 workers.
The team’s next win came equally quickly. Two weeks after standing up the cluster, the team completed loading the company’s key transactional data—9 years worth of electronics remittance data—into Hadoop.
Ready to put Hadoop to the ultimate test, the ZirMed team ran a battery of queries against the data in the cluster. The real-world queries they used fell into several buckets: those easily handled by the data warehouse, those that could be handled by the data warehouse but that required new aggregates to be created, and those that couldn’t be run on the data warehouse and had to be run against the source OLTP system. In each case, the Hadoop cluster bested the performance of the incumbent technology platform.
The most standout performance gains came from the most challenging queries: those that needed to be run against the OLTP system and that commonly required 24-48 hours to run in that environment. With HDP for Windows, the longest-running of these queries took only 25 minutes!
Now confident in his decision to turn to HDP for Windows, Schremser took the results to his CEO with his recommendation to move forward.
With the CEO’s nod, the team built a production Hadoop cluster running HDP for Windows 2.0. The 29 node cluster contains 1.2 PB of raw storage—of which 420 TB is usable considering replication—at a total hardware cost of $235,000. The company was used to paying $300,000 for 100TB of raw enterprise storage alone, (75-80 TB usable) with no processing power, and had invested over $750,000 when licenses and compute servers were factored in.
“With HDP for Windows we got a really resilient platform with 5 times the amount of usable storage, plus a ton of processing power, for about 30% of the cost of traditional enterprise technologies,” said Schremser.
Queries that took 25 minutes on the proof-of-concept cluster take 5 minutes now in the production environment. (With more data—the company continues to load transactions to the Hadoop cluster.) ZirMed’s team thinks they could optimize the query and get it down to 2 minutes, and they’re looking forward to adopting emerging Hadoop technologies like YARN and Tez to help them achieve even faster performance.
An additional benefit: because the data on Hadoop is accessible via Hive it is much easier for their team to work with. “It’s SQL. And anyone can write SQL,” says Schremser. “We expect our success with analytics and Hadoop to have a dramatic effect on ZirMed’s business, allowing us to increase customer satisfaction and create new revenue streams,” said Butts. “This is the first time in my career that I’ve seen technology move faster than the business.”
As a result of the company’s successful “Analytics 3.0” endeavor, ZirMed has begun to articulate an expanded vision of the next generation of health management, powered by data and analytics.
“We believe that the future of healthcare is in marrying data from financial and clinical applications with other public and private data sets,” said Butts. “This will allow providers to deliver a higher level of care while enhancing financial results, and we’re excited to be at the forefront of this trend.”