Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
October 12, 2017
prev slideNext slide

Big SQL: SQL on Apache Hadoop Across the Enterprise

Guest author Nagapriya Tiruthani, Offering Manager, IBM Big SQL, IBM

Why Big SQL?

Enterprise Data Warehousing (EDW) emerged as a logical home for all enterprise data that captures the essence of all enterprise systems. But in recent years, there’s been an explosion of data being captured from social media, sensors, etc. This rapid growth has put tremendous pressure on traditional systems as they run of space quickly.   Due to this new age data explosion, businesses are challenged to handle this rapid growth effectively and efficiently without relinquishing the ability of deriving existing insights, as well as enhance existing business logic to identify new opportunities.

One way to handle the rapid data growth is to offload data to Hadoop and free up space in an existing data warehouse or directly load the raw data to Hadoop. Hadoop is a highly scalable and low-cost storage platform where jobs are distributed across all the servers (nodes) in the cluster in parallel.   When these options are discussed with business at this juncture of exploding data warehouses, the immediate concern is can we query against data what is distributed across relational databases and Hadoop. This perfectly sets the stage to introduce IBM Big SQL.

What is Big SQL?

IBM has invested decades of research in building a robust engine that can efficiently execute queries even when they are complex for relational data. Big SQL leverages that very engine but adapted it to handle Hadoop data.  Some strengths that Big SQL inherently possess are advanced SQL compiler and cost based optimizer for efficient query execution.  Combining these strengths with a massive parallel processing (MPP) engine helps distribute query execution across nodes in a cluster.

IBM Big SQL

Why is IBM Big SQL an attractive option for data on Hadoop?

IBM Big SQL brings advanced SQL query engine capabilities to the Hadoop ecosystem that were typically available only for relational databases until now. Some of the core strengths of Big SQL are: SQL compatible, ANSI SQL compliant, federation, high performance, high concurrency, data security, automatic workload management, automatic memory management, application portability and many more.

There are many SQL engines on Hadoop that claim to be ANSI SQL compliant. But why is SQL compatibility important?  All relational databases follow the ANSI SQL standards. They also add a flavor of some specific SQL types that differentiates them from other relational databases. In data warehouse offload use cases, the data comes from Oracle, Db2, Netezza or any such relational data warehouses. Businesses have invested on developing applications that generate reports or insights against those warehouses. Now when you want to offload that data to Hadoop, what will happen to the applications? Can it be re-used?

Offload data from Oracle, Db2 or Netezza

To make the relational SQL differences obscure and seamless, Big SQL brings the ability to understand not only the generic ANSI SQL, but also SQL types specific to Oracle, Db2 and Netezza. Therefore, when you offload data from Oracle, Db2 or Netezza, the applications can be easily ported without any changes made. This simplifies the planning and execution of data warehouse offloading use cases as well as time and money spent on re-writing the applications to work on Hadoop. Another advantage is the SQL skills that engineer possess can be used against Hadoop.

Federation capability

In addition, with Big SQL, you can not only efficiently query data on Hadoop, but also combine data that is spread around different enterprise data warehouses. The federation capability of Big SQL lets you not only query against and combine with Hadoop data, but it also lets you pushdown predicates.  Therefore, not all data moves back and forth between the systems. Only the results of the predicates are sent back to combine with Hadoop data.

SQL compatibility and ANSI SQL compliant

This feature is a big plus for businesses who have invested a lot of time and money in building a comprehensive enterprise data warehouse because all the time and money invested is not obsolete anymore. The applications and SQL skills can be continued with data on Hadoop, thereby not adding additional time and money requirement when you want to add a Hadoop warehouse or data lake in the enterprise. SQL compatibility and ANSI SQL compliant engine in Big SQL enables seamless transfer of applications and SQL skills to execute SQL and PL/SQL statements.

With the Hortonworks partnership, IBM Big SQL is tightly integrated with Hortonworks Data Platform to provide businesses a robust, reliable, and resilient environment to maximize existing business and identify new business opportunities.

Some useful links:

A short video on IBM Big SQL: https://www.youtube.com/watch?v=fMaEeNsyrgE

To learn more about HDP and IBM Big SQL and also try the Sandbox with tutorials, go to https://hortonworks.com/partners/ibm-bigsql/

Or https://www.ibm.com/us-en/marketplace/big-sql

Leave a Reply

Your email address will not be published. Required fields are marked *