cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
November 04, 2014
prev slideNext slide

Improve Insight into Your Enterprise Data with Red Hat JBoss Data Virtualization and HDP

Back in September, we presented a 3-part webinar series on our collaborations with Red Hat. Close to a thousand registrants and attendees participated and provided rich interaction to our series. The content included an overview of our strategic partnership, demonstrated a couple of demos, and provided tutorials to get you started on your Big Data journey with Red Hat and Apache Hadoop.

In this blog, Kenneth Peeples, JBoss technology evangelist and principal marketing manager for Data Virtualization and Fuse Service Works at Red Hat, recaps the webinar series and offers insights into JBoss Data Virtualization and HDP. For those of you who missed one of our webinars (or want to review it), you can find recordings of all sessions on the Red Hat partner page.

I have had the privilege to work with the Hortonworks team designing, building and testing multiple use cases with Red Hat JBoss Data Virtualization (DV) and Hortonworks Data Platform (HDP). We designed and built four use cases that are easy to duplicate and run. During the three-webinar series with Red Hat and Hortonworks, all four use cases were covered. Webinar 2, Discover Red Hat and Apache Hadoop for the Modern Data Architecture, covers use cases 1 and 2, while Webinar 3, Discover Red Hat and Apache Hadoop for the Modern Data Architecture, covers use cases 3 and 4. This blog post covers the first use case and provides the sources and videos for you to try.

Introduction

rh_1Our strategic alliance between Hortonworks and Red Hat makes it easier for organizations to adopt Apache Hadoop in the enterprise. The alliance has three main areas of collaboration:

  • Deep joint engineering to create a tightly integrated platform for next generation data applications
  • Joint go-to market activities
  • Collaborative support of the joint offerings

The primary benefits to the enterprise include:

  • Faster development of new data driven analytic applications
  • Improved flexibility and operations for Hadoop deployments

Several technologies are tightly integrated with HDP – Red Hat Storage, Enterprise Linux, OpenJDK, OpenStack Platform and Data Virtualization. Our use cases and webinars focused on HDP and DV.

Benefits of Using Data Virtualization with Hortonworks Data Platform

There are several benefits to using DV and HDP together.

  1. Virtual Databases that we call VDBs are created for controlling access to data in a data lake while giving lines of business the autonomy they seek.
  2. New data in Hadoop combines with the data in traditional sources without moving or copying data.
  3. With all the standards available for client consumption such as JDBC, ODBC, OData, REST/SOAP Web Services, a variety of Business Intelligence and analytic tools can be used.
  4. With large amounts of data performance is always a concern in the enterprise.
  5. Caching is provided for faster access to data. Security of all data especially sensitive data is also a concern in the enterprise.
  6. Consistent security policies are provided across multiple Datasources.

Data Virtualization Concepts

Before going into the use cases, I want to cover a couple of DV concepts. The upstream community project for DV is Teiid and the Teiid Designer for JBoss Developer Studio. Teiid is a set of open source enterprise information integration tools noted for their ability to rapidly create data services that can quickly adapt to changes in your IT environment. A virtual database (VDB) is a container for components used to integrate data from multiple data sources, so that they can be accessed in an integrated manner through a single, uniform API. A VDB contains models, which define the structural characteristics of data sources, views, and web services. Once a VDB is created, it is deployed to the DV Server. A Translator provides an abstraction layer between Teiid Query Engine and a physical data source that knows how to convert Teiid issued query commands into source specific commands and execute them using the Resource Adaptor. It also has smarts to convert the result data that came from the physical source into a form that Teiid Query engine is expecting. A Resouce Adaptor provides the connectivity to the physical data source. This also gives a native way to issue commands to the source and gather results. A Resource Adaptor can be a RDBMS data source, web service, text file, connection to mainframe or to a custom source you defined. A Translator along with its Resource Adaptor must be configured on a Source Model. Several components worth highlighting for DV are described below:

  • Engine – The heart of Teiid is a high-performance query engine that processes relational, XML, XQuery and procedural queries from federated data sources. Features include support for homogeneous schemas, heterogeneous schemas, transactions, and user defined functions. The server is an enterprise-ready, scalable, manageable, runtime query engine that runs inside JBoss AS that provides additional security, fault-tolerance, and administrative features.
  • JDBC and ODBC Drivers – Allows access to the query engine through applications. We use the Teiid JDBC driver in our use cases.
  • Connectors – Teiid includes a rich set of Translators and Resource Adapters that enable access to a variety of sources, including most relational databases, web services, text files, and LDAP. Need data from a different source? A custom translators and resource adaptors can easily be developed.
  • Tools – Teiid Designer to define virtual databases containing views, procedures or even dynamic XML documents and Teiid Web Console in the Application Server.

Use Cases

We designed the material like the how to guide, videos and tutorials to make it easy to see the demonstrations in action as well as allow you to duplicate the demonstrations. The software required to run these examples are:

So let’s go through each use case to provide an overview, more technical detail and references to get you started.

Use Case 1 from Webinar 2 – Sentiment Analysis and Sales Analysis

rh_2This use case is the sentiment analysis and sales analysis with Hadoop and MySQL. It uses one Hortonworks Data Platform VM for the Twitter sentiment data and one MySQL database for the sales data.

  • Objective: Determine if sentiment data from the first week of the Iron Man 3 movie is a predictor of sales.
  • Problem: Cannot utilize social data and sentiment analysis with sales management system.
  • Solution: Leverage JBoss Data Virtualization to mashup Sentiment analysis data with ticket and merchandise sales data on MySQL into a single view of the data.
  • Demonstration Detail: The MySQL database contains the sales data by country with population, number of tickets sold, ticket revenue and merchandise sales. This is our relational source. The HDP contains the Twitter data of individual tweets with a timestamp, tweet text, country and sentiment value. We average the sentiment values of the tweets by country. The VDB contains the Models and the Unified View that aggregates the data from both sources. We then access the VDB through Microsoft Excel through the Teiid ODBC Driver, SQuirreL client with the Teiid JDBC Driver and the DV Dashboard with the Teiid JDBC driver.

Demonstration References

Additional Resources

To learn more, listen to the replays of the webinars listed below and look at the Red Hat page on Hortonworks.com:

Tags:

Comments

  • Leave a Reply

    Your email address will not be published. Required fields are marked *