Attunity is a long-time Hortonworks partner who provides data optimization and data integration software to help Hortonworks customers address exploding data growth, efficiently manage the performance of BI and data warehouse systems, and realize the tremendous economies of Apache Hadoop®. Attunity solutions are certified on HDF, HDP and are YARN Ready. Together, Hortonworks and Attunity are committed to advancing Hadoop through community-led innovation. This new solution is one more example of that.
By Jordan Martz, Director of Technology Solutions, Attunity
Attunity Compose for Hive automates the data pipeline to create analytics-ready data by leveraging the latest innovations in Hadoop such as the new ACID Merge SQL capabilities, available today in Apache Hive™ (included in HDP 2.6), to automatically and efficiently process data insertions, updates and deletions.
Attunity Compose for Hive was announced at the DataWorks Summit 2017 in San Jose, CA. Itamar Ankorion, Chief Marketing Officer at Attunity explained that “We help large corporations around the world implement strategic data lake initiatives by making data available in real-time for analytics and enabling them to overcome the inherent challenges associated with building modern data systems. Attunity Compose for Hive directly addresses these challenges to automate the implementation of Hive. It works by eliminating complex and lengthy manual development work for faster and more efficient implementation of analytics-ready data sets.”
Attunity Compose for Hive automates the creation, loading and transformation of data into Hadoop Hive structures. It fully automates the pipeline of business intelligence (BI) ready data into Hive, to create both Operational Data Stores (ODS) and Historical Data Stores (HDS). Attunity Replicate integrates with Attunity Compose to accelerate data ingestion, data landing, SQL schema creation, data transformation and ODS & HDS creation/updates.
With Attunity Compose for Hive, you have:
Step 1: Use Attunity Replicate ingest data into Hadoop and partition the data
Attunity Replicate transfers data into Hadoop and the HDFS files systems in parallelized formats via WebHDFS and HttpFS protocols or over NFS and connects to HCatalog via ODBC and HQL Scripts. As data is loaded into Hadoop, the process of data partitioning is introduced as a way of creating metadata to address the consistent, transactionally verified datasets. Data files are uploaded to HDFS, according to the maximum size and time definition, and then stored in a directory under the change table directory. Whenever the specified partition timeframe ends, a partition is created in Hive, pointing to the HDFS directory.
Step 2: Connect to the Hadoop Cluster and configure CDC and ETL process
The images below showcase the connections into Hive and into the source database, Northwind, a MySQL instance.
By optionally storing the history of changes through the Manage Metadata -> Save Changes screen, you have the ability to select design an Operational or Historical data stores.
Step 3: Generate HIVE LLAP code for loading data
Attunity Compose considers these key items while generating Hive ETL calls:
By adding some changes to the source system, the data becomes delivered to [table]’_delivery’ zone, which is where the final presentation layers.
By carrying audits throughout the process with another set of tables for audits per record in [table]’_landing HIVE tables that have change tables and a record of the table’s partitions. The CDC partitions create records of when changes hit those partitions in the ‘attrep_cdc_partitions.’
By reviewing the content, the latest merge content gets introduced. By looking at the latest updates and merges record by reviewing the ‘I’ (Insert) and ‘U’ (Update) statements, as well as, appending to process to reconcile, where a delete occurred.
Step 4: Configure the Parallelism and Optimizations needed
Throttling of run to overload the Hadoop cluster (by limiting the number of SQL statements we run), within the manage ETL set under ETL Commands, settings, then advanced to address the number of max concurrent DB connections to use.
Step 5: Show the data through Hive
Finally, a reconciled delivery zone of data presented through Hive.
In summary, the business benefits of Attunity Compose for Hive are:
Try Attunity Compose for Hive — To learn more or participate in the beta program, please click here.