Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
June 07, 2017
prev slideNext slide

Play It Where It Lies: Hortonworks Data Platform + Cloud Object Stores

Recently Shaun Connolly (of Hortonworks) and Tony Baer (of Ovum) presented “Get Started with Big Data in the Cloud”.  During this webinar, they discussed the opportunity to take advantage of the cloud for big data workloads. As we see an increase in data analytics in the cloud, we are also seeing an increase in data landing in native cloud object stores. As more applications and systems are running in the cloud, they are creating new data that is being leveraged by different applications and systems also running in the cloud. The public cloud providers (such as Amazon Web Services and Microsoft Azure) offer cost-effective cloud object storage services for holding that data.

In parallel, we are seeing an increasing amount of users deploying Hortonworks Data Platform in the cloud. It’s only natural that these users will have data that is landing in the cloud object stores and they want to process that data with the Hortonworks’ Connected Data Platforms.

Therefore, we are excited to announce support for integrating Hortonworks Data Platform with cloud object stores. The latest release of Hortonworks Data Platform 2.6 (HDP) includes a set of built-in “cloud connectors” that enable you to take advantage of a native integration with the cloud object storage services including: Amazon Web Services (for Amazon S3) and Microsoft Azure (for ADLS & WASB).

The cloud connectors allow you to seamlessly access and work with data stored in the storage services directly from HDP.  By doing so, you get the best of worlds: working with data “where it lies” in cloud storage service; and gaining insights into the data with powerful processing engines such Apache Hive and Apache Spark.

Cloud Object Stores

The latest release of HDP 2.6 includes support for integrating with the following cloud object storage services:

  • Amazon S3 (Simple Storage Service) The S3A connector implements the Hadoop filesystem interface using WWS Java SDK to access the web service, and provides Hadoop applications with a filesystem view of the buckets.
  • Microsoft Azure ADLS (Azure Data Lake Store) is a WebHDFS-compatible hierarchical file system. Applications can access the data in ADLS directly using WebHDFS REST API. Meanwhile, the ADLS connector implements the Hadoop filesystem interface using ADLS Java SDK to access the web service.
  • Microsoft Azure WASB (Windows Azure Storage Blob). The WASB connector implements the Hadoop filesystem interface using the WASB Java SDK to access the web service, and provides Hadoop applications with a filesystem view of the blobs.

By integrating with the cloud object storage service, you can query data via Hive external tables; read-in and write-out data with Spark; and copy data in-and-out of HDFS.

Learn More and Get Started

You can grab the latest Hortonworks Data Platform 2.6 release and get started today. Checkout the various resources below to learn more about the technology and the integration with cloud storage. Also, join us  for DataWorks Summit on June 13–15 in San Jose and save 25% off your all-access pass. Enter BLOG when you register.  Come hear Hortonworks Chief Architect, Sanjay Radia speak about Dancing Elephants – Efficiently Working with Object Stores from Apache Spark and Apache Hive during his session.

Thank you and have fun!

Hortonworks Data Platform 2.6
Product Documentation
Hortonworks Data Platform 2.6
Cloud Data Access Guide
Spark Summit East 2017
Session: Spark and Object Stores – What You Need to Know
Presenter: Steve Loughran


Leave a Reply

Your email address will not be published. Required fields are marked *