Geisinger immediately began the process of archiving and processing its 30 terabytes of patient data from Teradata into HDP. For most organizations, especially those in healthcare, the associated storage costs are a prohibitive factor. Geisinger, however, was able to save $2 million in EDW replacement costs and $500,000 in annual maintenance costs by eliminating the need to continue and expand its EDW platform. Geisinger also leveraged the ability to retain its existing Teradata queries and use them in the Hadoop SQL Workbench, which saved time and eased the transition.
Querying Unstructured Data
After Geisinger successfully on-boarded its structured data, attention turned to its unstructured data. A vast trove of medical records and doctor notes came into HDP in non-structured text format, but then had to be queried. Through the use of Apache Solr, the powerful open-source search platform that ships with HDP, Geisinger now runs queries on its unstructured data to derive analytical insights. Analysts now look at the sequence of patient visits, prescriptions, and medical records. Using Solr, clinicians and non-clinicians are able to search through 200 million patient note records in seconds to find relevant conditions and medications, which helps them analyze the success of treatments, identify areas of improvement, and determine ways to save time and money for both patients and providers.
Hadoop for Everybody
With the merging of structured and unstructured data, Geisinger tapped into a previously untapped well-spring of thought and innovation. Its initial trial of Teradata offload into Hadoop was limited to a few select users who only ran one SQL query. Soon there was a much larger demand for a wider variety of queries.
Geisinger also took advantage of the data governance and security features of HDP. This ensured success and compliance for every user who required it, and also ensured that data scientists who wanted to leverage similar functionalities such as R, but with the increasing scaling and performance of Apache Spark, could do so. The doctors and other hospital users who wanted to use Solr for querying unstructured data could accomplish this uninhibited, but according to their permissions. This level of data governance permits a fluid distribution of data to different users, so they can each make their own queries without affecting the queries of others. For instance, an ad hoc analytical study using Spark can instantly be spun up and down with no downtime or hassle.
This level of data governance permits a fluid distribution of data to different users, so that they can each make their own query without affecting the queries of others. For instance, an ad-hoc analytical study using Spark can instantly be spun up and down with no downtime or hassle.