Get Started


Ready to Get Started?

Download sandbox

How can we help you?

closeClose button

Big Data Analytics for the Pharmaceutical Industry
and Clinical Trials

cloud Ready to Get Started?

Download Sandbox

HDP for Pharma Yield Optimization

Merck & Co., Inc. adopted Hortonworks Data Platform (HDP™) to overcome three data challenges to its goal of improving yields in its manufacturing process. First, it needed to combine years of data from multiple data silos within its organization. Secondly, Merck needed to extend both the amount of data it could capture and its ability to retain that data for longer. Finally, the Merck team wanted to test new hypotheses "virtually" at a far lower cost than testing those ideas with real-world material and equipment. With HDP, Merck overcame those challenges to combine 10 years of vaccine manufacturing data and conduct 5.5 million cross-batch comparisons over 10 billion records. The resulting yield improvement grew profits by $10 million dollars for that one vaccine.

Unlocking the Power of Pharmaceutical Data

Big Data integration, internal and external collaboration, portfolio decision support, more efficient clinical trials, faster time to market, improved yields, improved safety - these are just a few of the big benefits pharmaceutical companies around the world achieve by tapping into the full power of their data.

Use Cases

Merck Optimizes Vaccine Yields: Striving for the “Golden Batch”

Merck optimized its vaccine yields by analyzing manufacturing data to isolate the most important predictive variables for a “golden batch”. Merck’s leaders had long relied on Lean manufacturing to grow volumes and reduce costs, but it became increasingly difficult to discover incremental ways to enhance yields. They looked into Open Enterprise Hadoop for new insights that could further reduce costs and improve yields. Merck turned to Hortonworks for data discovery into records on 255 batches of one vaccine going back 10 years. That data had been distributed across 16 maintenance and building management systems and it included precise sensor data on calibrations settings, air pressure, temperature, and humidity. After pooling all the data into Hortonworks Data Platform and processing 15 billion calculations, Merck had new answers to questions it had been asking for a decade. Among hundreds of variables, the Merck team was able to spot those that optimized yields. The company proceeded to apply those lessons to their other vaccines, with a focus on providing quality drugs at the lowest possible price. Watch Doug Henschen’s InformationWeek interview with George Llado of Merck.

Minimizing Waste Across the Drug Manufacturing Process

One Hortonworks pharmaceutical customer uses HDP for a single view of its supply chain and their self-declared “War on Waste”. The operations team added up the ingredients going into making their drugs, and compared that with the physical product they shipped. They found a big gap between the two and launched their War on Waste, using HDP to identify where those valuable resources were going. Once it identifies those root causes of waste, real-time alerts in HDP notify the team when they are at risk of exceeding pre-determined thresholds.

Translational Research: Turning Scientific Studies Into Personalized Medicine

The goal of Translational Research is to apply the results of laboratory research towards improving human health. Hadoop empowers researchers, clinicians, and analysts to unlock insights from translational data to drive evidence-based medicine programs. The data sources for translational research are complex and typically locked in data siloes, making it difficult for scientists to obtain an integrated, holistic view of their data. Other challenges revolve around data latency (the delay in getting data loaded into traditional data stores), handling unstructured and semi-structured types of data, and bridging lack of collaborative analysis between translation and clinical development groups. Researchers are turning to Open Enterprise Hadoop as a cost-effective, reliable platform for performing advanced analytics on integrated translational data. HDP allows translational and clinical groups to combine key data from sources such as: Omics (genomics, proteomics, transcription profiling, etc) Preclinical data Electronic lab notebooks Clinical data warehouses Tissue imaging data Medical devices and sensors File sources (such as Excel and SAS) Medical literature Through Hadoop, analysts can build a holistic view that helps them understand biological response and molecular mechanisms for compounds or drugs. They’re also able to uncover biomarkers for use in R&D and clinical trials. Finally, they can be assured that all data will be stored forever, in its native format, for analysis with multiple future applications.

Next Generation Sequencing

IT systems cannot economically store and process next generation sequencing (NGS) data. For example, primary sequencing results are in large image format and are too costly to store over the long term. Point solutions have lacked the flexibility to keep up with changing analytical methodologies, and are often expensive to customize and maintain. Open Enterprise Hadoop overcomes those challenges by helping data scientists and researchers unlock insights from NGS data while preserving the raw results on a reliable, cost-effective platform. NGS scientists are discovering the benefits of large-scale processing and analysis delivered by HDP components such as Apache Spark. Pharmaceutical researchers are using Hadoop to easily ingest diverse data types from external sources of genetic data, such as TCGA , GENBank , and EMBL. Another clear advantage of HDP for NGS is that researchers have access to cutting-edge bioinformatics tools built specifically for Hadoop. These enable analysis of various NGS data formats, sorting of reads, and merging of results. This takes NGS to the next level through: Batch processing of large NGS data sets Integration of internal with publically available external sequence data Permanent data storage for large image files, in their native format Substantial cost savings on data processing and storage

HDP Uses Real-World Data to Deliver Real-World Evidence

Real-World Evidence (RWE) promises to quantify improvements to health outcomes and treatments, but this data must be available at scale. High data storage and processing costs, challenges with merging structured and unstructured data, and an over-reliance on informatics resources for analysis-ready data have all slowed the evolution of RWE. With Hadoop, RWE groups are combining key data sources, including claims, prescriptions, electronic medical records, HIE, and social media, to obtain a full view of RWE. Analysts are unlocking real insights and delivering advanced analytic insights via cost-effective and familiar tools such as SAS® ,R®, TIBCO™ Spotfire® , or Tableau®. RWE through Hadoop delivers value with: • Optimal health resource utilization across different patient cohorts • A holistic view of cost/quality tradeoffs • Analysis of treatment pathways • Competitive pricing studies • Concomitant medication analysis • Clinical trial targeting based on geographic & demographic prevalence of disease • Prioritization of pipelined drug candidates • Metrics for performance-based pricing contracts • Drug adherence studies • Permanent data storage for compliance audits

Perpetual Access to Raw Data from Prior Research

University researchers follow a “publish or perish” mantra, and pharmaceutical scientists run experiments on new compounds. While these activities produce a wealth of groundbreaking public and proprietary research, the voluminous data supporting this research can become too costly to store for access by all parties. Enter the cost-effective storage and processing power of Hadoop. Hortonworks Data Platform can provide a perpetual storage platform that retains valuable research data and make it available for long-term querying, validation, and re-use. Researchers can leverage existing data for new analysis or enrich it with new types of data. With access to unprecedented volumes of data in its native format, scientists can search, query and model pharmaceutical effectiveness with increased insight and confidence in their findings.