SnapReduce 2.0 Leverages YARN to Bring iPaaS to Hadoop

Hortonworks certifies SnapLogic on HDP 2.1

More and more solution providers are integrating with Hortonworks Data Platform to provide their customers with enterprise Hadoop.

As part of our HDP 2.1 certification series, I would like to introduce Greg Benson, Chief Scientist at SnapLogic. In this blog, Greg provides some insights about the value of obtaining HDP 2.1 certification and the benefits of integration platform as a service (iPaaS). 

SnapLogic provides a cloud-based service for performing a wide range of data and application integration tasks. We recently introduced SnapReduce 2.0, which combines SnapLogic’s elastic integration platform as a service (iPaaS) with both on-premises and cloud-based Hadoop clusters. SnapReduce 2.0 is both YARN compliant and has achieved Hortonworks Data Platform (HDP) 2.1 certification.

SnapReduce 2.0 allows customers to further leverage their investments in Hadoop by allowing them to harness Hadoop resources for data integration tasks in addition to other Hadoop applications. Application and data integration tasks can now scale to the capacity of your Hadoop cluster as needed. In addition, SnapReduce 2.0 makes it easier to both acquire and deliver Hadoop data using a graphical Designer and Snap connectivity to a wide range of applications and data stores.

SnapLogic’s elastic iPaaS supports application integration, API integration, and also conventional extract, transformation and load (ETL) use cases. In addition, fundamental to SnapLogic is the native support for hierarchical documents. This native support can be used to easily create JSON data files in HDFS as well as line-oriented records as needed. SnapLogic’s modern HTML5-based Designer makes it easy to acquire and deliver Hadoop data without programming. This is especially useful for database users and data scientists who want to utilize Hadoop data but are not skilled programmers.

YARN is the architectural center for Hadoop controlling access to critical Hadoop resources and turning Hadoop into a multi-application platform. SnapLogic achieves Hadoop cluster utilization through our YARN-based Snaplex. A Snaplex is a collection of containers that can execute SnapLogic data flow pipelines. The control of pipeline design and execution resides in our cloud-based “control plane.”  However, our YARN-based Snaplex coordinates with both the YARN resource manager and our cloud control. In this way we can apply our scale-out and scale-in algorithms to Snaplex nodes running in Hadoop. SnapLogic’s approach also eliminates the need for software updates because our connectors, called Snaps, are dynamically downloaded and cached as needed. Even the Snaplex container will auto update if desired. Essentially, a customer can easily extend their Hadoop resources for data integration as needed.

In terms of performance, co-locating the Snaplex inside Hadoop and with the destination or source of data allows Snaplogic pipelines to efficiently stream data into HDFS from multiple sources, whether in the cloud or on-premises. Similarly, Snaplogic pipelines can be used to deliver HDFS data to external applications and data stores.

Increased Hadoop adoption coupled with the limitations of traditional data management tools has created a demand for a new approach to data acquisition and delivery. The good news is that, as more and more IT organizations struggle to get data in and out of Hadoop with complicated and functionally limited or legacy extract, transform and load (ETL) tools, a new breed of integration technology has emerged that is built to tackle today’s social, mobile, cloud and big data requirements.

SnapReduce 2.0 brings the SnapLogic iPaaS to Hadoop-scale processing.

Learn More

To learn more about SnapLogic SnapReduce 2.0, please visit: http://www.snaplogic.com/snapreduce

About Greg Benson:

I am in the fortunate position of working in both academia and industry. In addition to my work at SnapLogic, I am a Professor of Computer Science at the University of San Francisco and I have worked on research in distributed systems, parallel programming, OS kernels, and programming languages for the last 20 years. Having these two roles has been mutually beneficial. I am able to apply my research to real world systems and I bring back that experience to the classroom. At SnapLogic, I have worked on advancing our product technology and most recently I was on the original architecture team for our new cloud-based integration platform, which is a sophisticated distributed system.

Categorized by :
Architect & CIO Cloud Data Management Hadoop in the Enterprise HDP 2.1 Modern Data Architecture YARN

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Integrate with existing systems
Hortonworks maintains and works with an extensive partner ecosystem from broad enterprise platform vendors to specialized solutions and systems integrators.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.