Paul Boal, Director of Data Management & Analytics at Mercy, is our guest blogger. He shares his thoughts and insights about Apache Hadoop, Hortonworks Data Platform and Mercy’s journey to the Data Lake.
Mercy has long been committed to using technology to improve medical outcomes for patients. We were among the first health care organizations in the U.S. to have a comprehensive, integrated electronic health record (EHR) providing real-time, paperless access to patient information.
We use an EHR from Epic Systems. Every patient activity is entered into the Epic database, including both clinical and financial interactions. All reporting and analysis against the Epic database is done via an associated Oracle-based data warehouse called Clarity. At Mercy’s size, Clarity poses the usual challenges associated with data warehouses: it is expensive to scale, it requires a rigid data schema, and it is slow for some queries.
To overcome these challenges, Mercy has partnered with Hortonworks to create the Mercy Data Library, a Hadoop-based data lake running on Hortonworks Data Platform (HDP). The Data Library will contain volumes of batch data extracts from relational systems like Clarity and Lawson as well as real-time data directly from Epic. We will soon integrate other data sources, including social media and weather information for specialized projects.
The strength of Hadoop as a data platform is its ability to ingest and combine data sets from all these sources and formats. The combination of all of these data sets in a common platform enables us to ask questions that we weren’t able to ask previously, and we can ask those on an increasingly larger scale. Because of the low cost of storage on the platform, we can store information that we might have otherwise ignored if it were at a higher storage cost.
To understand the advanced analytic applications that we plan at Mercy using HDP, take as an example our patient vitals project. Today, when a patient is in the ICU, the devices reading the patient’s vitals send a record of their vitals to the EHR once every second. Periodically, a nurse in the ICU will review the patient’s vitals in the EHR and select one set of readings as a “good reading.” All of the other data is erased from the system. There are very good reasons for this practice when Epic is part of our data architecture:
However, the frequency of readings captured in Clarity doesn’t allow analysis of some questions. What if a researcher was interested in determining which medicines bring down fever fastest? The readings that are recorded in Epic do not give the researcher a fine-grained measure to determine the efficacy of the medicine over seconds or a few minutes.
Also, the noisiness of the vital readings may give the clinical staff a valuable indication about how much the patient is moving around within those fifteen minutes. There may also be a correlation between movement and heart rate, breathing, or pain. But without detailed readings, these correlations may remain hidden behind the coarseness of the data that we were collecting with Epic.
Finally, the Clarity reporting database is updated only once per night with the previous day’s Epic data. In order for data analysis to have an immediate impact on patients under our care, the data being used for decision making, has to be nearly real-time.
With our Hadoop-based Data Library we hope to more closely approach a real-time data-on-demand model for researchers and clinicians. We currently use a combination of Apache Sqoop, Storm and HBase for more granular updates. These apply updates every hour, and we expect to shorten this to only two or three minutes in the future.
One important thing that we’ve learned is to not neglect the knowledge already in the existing Clarity data model. Instead, we try to leverage that knowledge when we replicate the data into Hadoop. We wrap the Oracle data with additional metadata, allowing us to introduce functionality and features not available via Clarity into our analysis and reports.
While open source is not necessarily a priority for Mercy, we have benefited significantly from the rate of innovation in the open-source Hadoop ecosystem. We are also grateful to have a partner in Hortonworks, whose attitudes on effectively servicing both their customers and the community have created a strong customer relationship for Mercy.
Mercy is the fifth largest Catholic health care system in the U.S. and serves millions annually. Mercy includes 35 acute care hospitals, four heart hospitals, two children’s hospitals, three rehab hospitals and two orthopedic hospitals, nearly 700 clinic and outpatient facilities, 40,000 co-workers and more than 2,000 Mercy Clinic physicians in Arkansas, Kansas, Missouri and Oklahoma. Mercy also has outreach ministries in Louisiana, Mississippi and Texas. For specific information about Mercy’s commercial technology services, visit mercytechnology.net