Hadoop Architecture: Enterprise-Ready Analytics with Revelytix Loom
There are myriad of use cases for Big Data applications across industries. For example, financial companies want to analyze Governance to assess levels of risk and compliance. Transportation companies want to analyze overall logistics for optimization. Oil and Gas companies supplying energy want to predict machine failings to reduce risks of outages. Insurance companies will need to analyze actuarial information in order to calculate individual policy premiums – yes, the impending Affordable Care Act.
Let’s take a closer look at how insurance companies are planning and preparing to handle such a massive change to the healthcare industry. Hortonworks Partner Revelytix introduces the challenge and their solution here.
The healthcare industry and insurance companies current analytic infrastructure requires an enormous investment to effectively transition to support the new analytic requirements facing the industry. Moving fast forward, insurance companies will need to be able to:
- Handle scale of data volume
- Enable complex data cleansing and integration
- Enable analysts to work directly with data and produce data products
- Interoperability with existing IT
While every organization is different, their Big Data projects are often very similar. Hadoop, as a critical piece of emerging data architectures, is being used to collect massive amounts of data from website clickstreams to machine and sensor data. It is this data that is turning the conversation from “data analytics” to “big data analytics”.
By implementing Hortonworks Data Platform (HDP) with Apache Hive, insurance companies are implementing a scalable cost effective infrastructure and creating a “data lake” enabling them to perform iterative investigation for value – adding Revelytix Loom to run alongside a Hadoop cluster of any size; enables analysis to support transition from primarily company plans to individual plans and allowing analysts to work directly in the Hadoop environment.
How it Works
The Hortonworks Data Platform architecture provides the foundation, and combined with Loom gives companies the essential capabilities to support an enterprise-ready analytics platform. The Loom Activescan feature crawls the cluster on scheduled intervals to discover and parse new files and new Hive databases. These are automatically registered in Loom as Sources. Activescan is pluggable, so that specific Source recognizers and parsers can be defined for any file format. Once Sources are parsed correctly, they are converted to Loom-managed Datasets.
Loom Datasets have known, formal schemas and row and column level statistics generated by Activescan. Datasets are also actionable; they can be transformed using HiveQL through the Loom Workbench. Loom tracks the execution of all transformations and automatically generates lineage metadata. Lineage graphs for Datasets show detailed relationships between Datasets, through transformation executions.
The Loom Workbench is used by data scientists, data engineers, and other Hadoop users to track, manage, and transform Hadoop-based data. The Loom API exposes all this functionality to third-party tools, so that users can make use of other products for data loading or transformation, while still tracking and managing data and transformations through Loom. This enables Loom to serve as the central dataset management platform for a cluster.
Thank you to our partner Revelytix for this Hadoop use case.
Revelytix is an innovator in the Hadoop space with products like Loom designed to increase productivity of data scientists and other Hadoop users by providing essential capabilities that enable the Hortonworks Data Platform to be enterprise-ready as an analytics platform. Loom automates many data management tasks and collects detailed lineage metadata. Loom’s API provides a simple, robust access point into Hadoop data for third-party applications.