Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
March 31, 2016
prev slideNext slide

Hadoop in Healthcare, pt. 2

This is the second blog in a series written by Richard Proctor, GM of Global Healthcare at Hortonworks, Inc. The first can be found here. The series will discuss the reasons for Healthcare’s surging interest in, and rapid adoption of, Hadoop.

Healthcare represents the next frontier in gaining big insights from Big Data. Remote patient monitoring, telehealth, and proactive patient behavior insights hold the promise of improving chronic disease management and meeting the triple aim of:

  1. Improving the patient experience of care, including quality and satisfaction
  2. Improving the health of populations
  3. Reducing the per capita cost of health care

The Big Data Opportunity: Improved Health Outcomes at Reduced Costs

The healthcare industry is undergoing a revolution, driven by an irreversible surge in the quantity and availability of data. Healthcare providers, payers, researchers, and increasingly the patients themselves, are generating that data from a wide range of sources from both within and outside the four walls of the care delivery system:

  • Home Health
  • Long and short term care
  • Hospice
  • Clinic
  • Specialty care
  • Pharmacy-based

This data includes traditional clinical and transactional data such as claims, electronic medical records (EMR), lab results, and radiological images. These sources are used almost exclusively at the moment of care with an occasional retroactive lookup. Yet this rich trove of information is almost never available for analysis in the aggregate to look for trends across patients, caregivers, and protocols.

The healthcare data surge also includes newer sources generated from wearable sensors and devices, mobile phones, websites and social media interactions. Traditional data platforms cannot store this multifaceted data without costly extract-transform-load (ETL) processing. As a result, this rich trove of data on what people actually do about their own health is unavailable to clinicians, researchers and patients. In the cases where unstructured data is available, it is retained only for weeks or months. Keeping it longer than that is too expensive.

Hadoop can enhance operational efficiencies to reduce costs and friction in the changing healthcare landscape. In a recent survey, management consulting firm McKinsey & Company estimated the potential operational savings to be in the hundreds of billions of dollars:

“To determine the opportunity of [big data], we evaluated a range of health-care initiatives and assessed their potential impact as total annual cost savings, holding outcomes constant, using a 2011 baseline. If these early successes were scaled up to create system wide impact, we estimate that [big data] could account for $300 billion to $450 billion in reduced health-care spending, or 12 to 17 percent of the $2.6 trillion baseline in US health-care costs.”
McKinsey & Company

Challenges Facing Healthcare today

Compounding the challenge facing healthcare organizations is the fact that collected data can have little or no value as individual or small groups of records. But at high volumes, or with a longer historical perspective, data can be inspected for patterns and used for advanced analytic applications.

In addition, healthcare enterprises seeking to harness the power of big data face a distinct set of challenges. These include:

  • Data integration – Leveraging diverse datasets in spite of privacy and security concerns, a lack of accepted standards, and a heterogeneous community of data collectors.
  • Resourcing – Many of the most valuable insights driven by big data have been the result of predictive analytics, but these require tools and techniques new to many healthcare organizations, such as Hadoop and machine learning.
  • Applying data-driven insights – Incorporating big data analytics into healthcare delivery will require a shift in industry practices that opens the door to acting on analytical insights in addition to more traditional results from randomized clinical trials.

The Opportunity

By harnessing low-cost commodity servers or cloud options and an open source development model, Hadoop data platforms provide an economic and high-performance approach to data storage and processing that can scale to meet the needs of the very largest healthcare organizations.

Many organizations are already harnessing the power of data to build a single view of operations in order to gain visibility into:

  • Population health
  • Changing reimbursement models
  • Proactive supply chain and staffing management
  • Combining Big Data and machine learning algorithms
  • Discovering new clinical insights by combining EMR data with new, unstructured, purchased, and publically available data

Healthcare entities store and process traditional and non-traditional data in centralized data repositories rather than the siloed environments of the past. This new data lake allows for three key components that are changing the way data is leveraged:

  1. Collect Everything – a data lake can contain all raw sources of healthcare data over extended periods of time, either for immediate decisions or historical analysis.
  2. Dive In Anywhere – a data lake enables users across multiple business units to explore a single, shared dataset to answer their specific questions.
  3. Access Data in Many Ways – a data lake supports multiple data access patterns across a shared infrastructure, enabling access via batch, interactive, real-time, in-memory, and search methods.

Healthcare organizations of all sizes and segments can benefit from an open enterprise data platform that supports data in flight and data at rest. From managing inbound sensor data to predicting post care behavior by analyzing and combing a multitude of data sources like social media, pharmacy, and IOT technologies, data is constantly in motion and must be managed and analyzed. Once collected, data at rest can be mined to determine product trends, consumer demand, and future promotions.

The Hadoop ecosystem and its open source approach present an unprecedented opportunity to build a platform of the future in healthcare. Whether the focus is data at rest or data in motion, there has never been an economically or technically viable solution like there is now.


Leave a Reply

Your email address will not be published. Required fields are marked *