This post was authored by Ram Venkatesh, Hortonworks VP, Engineering; and James Malone, Cloud Product Manager, Google.
If you’re looking for a fully-managed cloud service for running data and analytics clusters, thanks in large part to the Apache Hadoop and Spark communities, you might very well look to Cloud Dataproc, which offers both long-running and job-scoped clusters. Job-scoped clusters let you tailor your clusters to a specific job, instead of spending development time building individual job configurations that compete for resources on a multi-purpose cluster. Dataproc even has features such as workflow templates that help take advantage of these ephemeral and job-scope clusters. Our friends at Spotify Labs have also released an open source framework, Spydra, that makes it even easier to automate this cluster life-cycle management in Cloud Dataproc.
While this model has helped many companies run purpose-built clusters in Google Cloud, a recently announced partnership with Hortonworks extends this model to Hadoop distribution-scoped clusters. Together, Hortonworks and Google enable enterprises to easily adopt Google Cloud for their big data workloads powered by Apache Hadoop, Apache Hive, Apache Spark and the rest of the open-source data analytics ecosystem. In addition to building job-scoped clusters based on compute resources, customers can now mix-and-match big data workloads that are integrated with existing security and governance capabilities via Apache Ranger and Apache Atlas. This powerful architecture can help you achieve significant cost savings and massive scalability.
Cloud Storage Connector integration is available now in the Hortonworks Data Platform 3.0 (HDP) release. With this expanded collaboration, Cloud Storage has become a fully integrated connector for data access and processing of Hadoop and Spark workloads.This new, deeper integration between Google Cloud and Hortonworks enables hybrid deployment models and gives customers the requisite consistency to leverage familiar on-prem enterprise applications in the cloud.
As part of this partnership, HDP and Hortonworks DataFlow (HDF) are fully supported and available on Google Cloud Platform (GCP). This combination gives customers maximum flexibility and scalability to deliver large-scale data analytics across hybrid and multi-cloud deployments.
In addition to mixing and matching Hadoop distributions for more purpose-driven clusters on GCP, the use of Hortonworks can reach beyond GCP as well. With Hortonworks on GCP, customers can expect:
Scalable adoption from the enterprise edge to Google Cloud
Enterprise customers can now easily expand their big data applications using HDP and Hortonworks DataFlow (HDF) running on Google Cloud. HDP or HDF running on Google Cloud provides customers a consistent management, security, and data governance experience across hybrid and multi-cloud architectures.
Data analytics across hybrid and multi-cloud
With Cloud Storage-backed Hadoop, you get sufficient flexibility and agility to run ephemeral HDP workloads on Google Cloud. With improved automated cloud provisioning of HDP and HDF, it is easy for customers to migrate and configure consistent, secure workloads across Google Cloud, in both hybrid and multi-cloud environments.
Real-time streaming analytics for IoT
HDF offers customers IoT data ingestion, transformation and routing to easily process real-time streaming data from the edge to the enterprise by leveraging NiFi, Kafka, SAM, Storm, and more. HDF can now be more easily integrated to your Google Cloud IoT systems, analytics applications that do real-time event correlation, content enrichment, complex event processing, analytical aggregation and alerts/notifications. All of this can now be brought into the same cloud platform.
Our commitment to open source
Both Hortonworks and Google Cloud are committed to ensuring that this existing platform as well as future versions of Hortonworks combined with the Cloud Storage connector are open source. As companies, Google and Hortonworks have long histories of open source collaborations and believe that community-based open innovation is the best way to achieve advanced Hadoop cloud architectures.
To learn more about our partnership: