Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
November 21, 2013
prev slideNext slide

Modern Healthcare Architectures Built with Hadoop

We have heard plenty in the news lately about healthcare challenges and the difficult choices faced by hospital administrators, technology and pharmaceutical providers, researchers, and clinicians. At the same time, consumers are experiencing increased costs without a corresponding increase in health security or in the reliability of clinical outcomes.

One key obstacle in the healthcare market is data liquidity (for patients, practitioners and payers) and some are using Apache Hadoop to overcome this challenge, as part of a modern data architecture. This post describes some healthcare use cases, a healthcare reference architecture and how Hadoop can ease the pain caused by poor data liquidity.

New Value Pathways for Healthcare

In January 2013, McKinsey & Company published a report named “The ‘Big Data’ Revolution in Healthcare”. The report points out how big data is creating value in five “new value pathways” allowing data to flow more freely. Below we present a summary of these five new value pathways and an an example how Hadoop can be used to address each. Thanks to the Clinical Informatics Group at UC Irvine Health for many of the use cases, described in their UCIH case study.

Pathway Benefit Hadoop Use Case
Right Living Patients can build value by taking an active role in their own treatment, including disease prevention. Predictive Analytics: Heart patients weigh themselves at home with scales that transmit data wirelessly to their health center. Algorithms analyze the data and flag patterns that indicate a high risk of readmission, alerting a physician.
Right Care Patients get the most timely, appropriate treatment available. Real-time Monitoring: Patient vital statistics are transmitted from wireless sensors every minute. If vital signs cross certain risk thresholds, staff can attend to the patient immediately.
Right Provider Provider skill sets matched to the complexity of the assignment— for instance, nurses or physicians’ assistants performing tasks that do not require a doctor. Also the specific selection of the provider with the best outcomes. Historical EMR Analysis: Hadoop reduces the cost to store data on clinical operations, allowing longer retention of data on staffing decisions and clinical outcomes. Analysis of this data allows administrators to promote individuals and practices that achieve the best results.
Right Value Ensure cost-effectiveness of care, such as tying provider reimbursement to patient outcomes, or eliminating fraud, waste, or abuse in the system. Medical Device Management: For biomedical device maintenance, use geolocation and sensor data to manage its medical equipment. The biomedical team can know where all the equipment is, so they don’t waste time searching for an item.Over time, determine the usage of different devices, and use this information to make rational decisions about when to repair or replace equipment.
Right Innovation The identification of new therapies and approaches to delivering care, across all aspects of the system. Also improving the innovation engines themselves. Research Cohort Selection: Researchers at teaching hospitals can access patient data in Hadoop for cohort discovery, then present the anonymous sample cohort to their Internal Review Board for approval, without ever having seen uniquely identifiable information.

Source: The ‘Big Data’ Revolution in Healthcare. McKinsey & Company, January 2013.

At Hortonworks, we see our healthcare customers ingest and analyze data from many sources. The following reference architecture is an amalgam of Hadoop data patterns that we’ve seen with our customers’ use of Hortonworks Data Platform (HDP). Components shaded green are part of HDP.


Sources of Healthcare Data

Source data comes from:

  • Legacy Electronic Medical Records (EMRs)
  • Transcriptions
  • PACS
  • Medication Administration
  • Financial
  • Laboratory (e.g. SunQuest, Cerner)
  • RTLS (for locating medical equipment & patient throughput)
  • Bio Repository
  • Device Integration (e.g. iSirona)
  • Home Devices (e.g. scales and heart monitors)
  • Clinical Trials
  • Genomics (e.g. 23andMe, Cancer Genomics Hub)
  • Radiology (e.g. RadNet)
  • Quantified Self Sensors (e.g. Fitbit, SmartSleep)
  • Social Media Streams (e.g. FourSquare, Twitter)

Loading Healthcare Data

Apache Sqoop is included in Hortonworks Data Platform, as a tool to transfer data between external structured data stores (such as Teradata, Netezza, MySQL, or Oracle) into HDFS or related systems like Hive and HBase. We also see our customers using other tools or standards for loading healthcare data into Hadoop. Some of these are:

Processing Healthcare Data

Depending on the use case, healthcare organizations process data in batch (using Apache Hadoop MapReduce and Apache Pig); interactively (with Apache Hive); online (with Apache HBase) or streaming (with Apache Storm).

Analyzing Healthcare Data

Once data is stored and processed in Hadoop it can either be analyzed in the cluster or exported to relational data stores for analysis there. These data stores might include:

  • Enterprise data warehouse
  • Quality data mart
  • Surgical data mart
  • Clinical info data mart
  • Diagnosis data mart
  • Neo4j graph database

Many data analysis and visualization applications can also work with the data directly in Hadoop. Hortonworks healthcare customers typically use the following business intelligence and visualization tools to inform their decisions:

  • Microsoft Excel
  • Tableau
  • RESTful Web Services
  • EMR Real-time analytics
  • Metric Insights
  • Patient Scorecards
  • Research Portals
  • Operational Dashboard
  • Quality Dashboards

The following diagram shows how healthcare organizations can integrate Hadoop into their existing data architecture to create a modern data architecture that is interoperable and familiar, so that the same team of analysts and practitioners can use their existing skills in new ways:

Healthcare Ecosystem

As more and more healthcare organizations adopt Hadoop to disseminate data to their teams and partners, they empower caregivers to combine their training, intuition, and professional experience with big data to make data-driven decisions that cure patients and reduce costs.

Watch our blog in the coming weeks as we share reference architectures for other industry verticals.

Download the Datasheet



Bob Rogers says:
Your comment is awaiting moderation.

Very interesting and useful article. Thank you.

Your readers may be interested to know that Apixio ( has built a Big Data infrastructure for healthcare that reflects many of the concepts you articulated, and which includes Hortonworks Hadoop. We have solved the data liquidity problem by providing real business value to our healthcare provider and healthcare payer customers and by developing a large number of interfaces that allow us to import data from all of the major EHR products.

-Bob Rogers, PhD
Chief Scientist, Apixio

Fred says:

Why is UIMA listed as at tool for loading data? It is for “facilitating the analysis of unstructured content “.
And what is “JAVA ETL rules”?

Pei says:
Your comment is awaiting moderation.

I believe UIMA is just a skeleton or framework. So I’m also not sure how that would be helpful in the data loading process. I think you would need something like Apache cTAKES or Wired Informatics’ Invenio product to make use of unstructured healthcare data.

Ganesan Ganapathi says:

Would be able share any customer implemented end-to-end modern-healthcare with success story

change system fonts windows 10 says:

I must tell you that modern healthcare is the need of the today time.

Nick White says:

Thank you for taking the time to discuss this informative content with us. For all health club membership software needs do check Their member lifecycle management software is built only for the Fitness Industry. Their sales lead tracking software helps generate and capture more leads, proactively reaches out to them, and closes more sales.

Leave a Reply

Your email address will not be published. Required fields are marked *