Smarter ETL with Hadoop and Syncsort

Syncsort-DI-NewSyncsort, a technology partner with Hortonworks, helps organizations propel Hadoop projects with a tool that makes it easy to “Collect, Process and Distribute” data with Hadoop. This process, often called ETL (Exchange, Transform, Load), is one of the key drivers for Hadoop initiatives; but why is this technology a key enabler of Hadoop? To find out the answer we talked with Syncsort’s Director Of Strategy, Steve Totman, a 15 year veteran of data integration and warehousing, provided his perspective on Data Warehouse Staging Areas.

Apache Hadoop has emerged as the de facto operating system for Big Data Applications enabling customers to deal with the avalanche of data coming from logs, email, machine data like sensors, mobile devices, social and more.

While business intelligence systems are typically the last stop in extracting value from Big Data, the first stop in EVERY data warehouse – one that is rarely talked about openly  – is the data staging area; and most importantly it is precisely here where almost all of the hard work and heavy lifting occurs. That’s the little dirty secret of any organization with a warehouse. Ten years ago, traditional data integration tools promised a simple way to take data from multiple, disparate sources, transform it via the staging area into critical insights and load it into a warehouse where business users would leverage it for competitive advantage. But the truth is, these tools are breaking today. Big Data has already stepped beyond the scalability of traditional ETL and is even jeopardizing SLAs on the data warehouse itself.

Hadoop promises to change all of this. That’s why organizations are looking at Hadoop as means to extend and heal their data warehousing architectures.


At Hadoop Summit, the concept of a data lake or reservoir was discussed as the area to capture and store ALL data in one place and interact with that data in multiple ways. Interestingly, the data lake is exactly what the ultimate data warehouse staging area was envisioned to be and what people have wanted since data warehouses began.

With comprehensive connectivity, a graphical development environment to develop MapReduce jobs without coding, and faster MapReduce, the Syncsort’s data solution for Hadoop to “Collect, Process and Distribute” is everything you need to turn Hadoop into the most scalable, most affordable and easiest-to-use data integration environment. No coding, no tuning; just smarter ETL.

In the end, Syncsort and Hortonworks together, can help you leverage Hadoop to turn your staging area from that dirty little secret you’re afraid to confess, to the shining star you are proud to show off.

To get started with Hadoop and Syncsort, you can see this demonstration: Deploying Hadoop ETL in the Hortonworks Sandbox 

Find out more about a modern data architecture for your organization.

Categorized by :
HDP 1.x Sandbox

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.