Recently the Oracle Data Integrator products were certified on the Hortonworks Data Platform version 2.1 and we’re delighted to be working more closely with Oracle engineering on these kinds of efforts. We’re happy to bring this guest blog to you today, written by Alex Kotopoulis, Product Manager, Oracle Data Integration for Big Data, at Oracle to discuss the recent integration and certification initiatives. You can learn more by joining our webinar on November 11, register here.
At the core of every Big Data environment is the ability to move and transform data at scale. The Hortonworks Data Platform (HDP) provides a rich selection of Apache Hadoop technologies to accomplish this, such as YARN, HDFS, Hive, Scoop, Pig, Spark, Flume, and others. Oracle Data Integrator is in a unique position to harness these native Hadoop technologies and provide declarative design and management of these processes without the need of a proprietary transformation engine or manual coding.
In Oracle Data Integrator (ODI) users can define data movement and transformations by graphically creating logical mappings. Users create a flow from sources to targets of different technologies, including relational databases, applications, XML, JSON, Hive tables, HBase, HDFS files, etc. The user can also insert filters, joins, aggregates, and other transformation components. The user can do this logical design without choosing the implementation of data movement and transformation.
After the logical design is complete, ODI suggests a physical implementation, which includes mechanisms to move data, such as Sqoop or Oracle Loader for Hadoop, and the engine where transformations are executed, such as Hive, or on a source or target relational database. ODI generates code for each of the underlying technologies using the Knowledge Module concept, code templates that are reusable and user-editable to encapsulate best practices of the user’s development organization. The ODI user is insulated from the details of the underlying Hadoop languages and configuration files that otherwise would have to be developed manually. The code is executed in the native Hadoop tools and engines; unlike other data integration tools, ODI does not use a proprietary ETL engine or require dedicated hardware for it. The lightweight ODI agent may optionally install directly on the NameNode, or alternately on a small host off-cluster. There are no requirements for proprietary code to run on any Hadoop Data Nodes. By using Hadoop-native engines for all movement and transformation, the user can maximize the benefit of the Hadoop feature set, has full visibility to the generated code and behavior and is not bound to the limitations of a non-standard execution environment. Because they are using YARN integrated engines, they will also run in a resource optimized manner along side other workloads thereby maximizing the cluster usage and performance.
ODI’s declarative methodology of separating logical design from physical implementation allows staying up to date with the latest Hadoop tools. As the underlying data movement and transformation technologies are enhanced and new projects emerge, Knowledge Modules can be developed to execute your existing logical design using the latest Hadoop capabilities.
Oracle Data Integration makes Big Data Integration on Hortonworks better by providing declarative and native data movement, fast replication from relational databases using Oracle GoldenGate, complete data lineage through Oracle Enterprise Metadata Manager, and best-in-class Hadoop integration with Oracle Database using Oracle Big Data Connectors.
Join us for the joint Oracle and Hortonworks Webinar to learn more about these integration technologies. Register here.
You can learn more about the Oracle relationship with Hortonworks here.