Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
October 31, 2017
prev slideNext slide

Disasters Can Be Instant, It Takes a Village to Build a Hybrid Cloud Based Recovery Solution!

Recently, Hortonworks CTO Scott Gnau introduced our Hortonworks 3.0 vision, powered by Hortonworks DataPlane Service (DPS). This is a next-gen platform that  addresses real customer pain points around the security governance, data catalog, and building extensible services.

Data Lifecycle Manager (DLM) is the first generally available extensible service that Hortonworks DPS will support. It is designed around our customers’ use cases for replication, backup, tiering and more.  And it will encompass multiple sources of data (clusters, data lakes) as well as multiple tiers (on-prem, multiple cloud, hybrid).

 

First, Some Context

Data Lifecycle Manager (DLM) addresses many of the use cases for Business Continuity/Disaster Recovery that we see our  customers trying to solve.. Further, this also helps them adopt a hybrid cloud approach for agility and total cost of ownership (TCO) savings. They are interested in moving a sub-set of data in existing Hadoop clusters to the cloud and then use the computer power of Cloud to run Spark jobs, etc.

What Are the Benefits of Data Lifecycle Manager

Data Lifecycle Manager (DLM) is a suite of products designed to provide Disaster Recovery & Hybrid Cloud capabilities to replicate unstructured data (Hadoop Storage File System Directories) and structured data (Hive SQL Databases), along with respective meta-data (Hive meta-store, Views/User defined functions) and auxiliary data such as Security Policies maintained by Ranger with a Recovery Point Objective (RPO).

DLM provides coverage across on-premise and cloud deployments, while maintaining a single source of truth behavior and the datasets in the disaster recovery site can be selectively made active with a failover operation.

In future versions, we plan to provide additional functionalities, such as Backup & Restore and Tiering. Backup & Restore will allow customers to keep point in time (PiT) copies of Hadoop data so that in case of data corruption or accidental deletion, they can go back in time and recover their data.  Auto Tiering will enable customers  to create dynamic policies suited to their organization and, based on their user access patterns, they can move the data between different tiers (e.g. from your expensive SSD media to HDD media or Amazon S3 buckets) to reduce the total cost of ownership.

Example Use Case- A Connected Car Company with Multi Clusters and Tiers

Below, you will find an example of a fictional connected car company with multiple clusters and tiers. The customer wants to move the billing data replicated to cloud so that the data scientists can do predictive pricing analytics using Cloud compute. The customer also wants to replicate some of the self-driving training data sets, with the secondary data center as the disaster recovery site. However, the customer wants to leverage the secondary data center as an active site for the equipment schedule, stored as structured data in Hive and then replicate to the primary data center. All the flows can (as depicted in the generally available user interface instantiation) be managed and configured from a single user interface pane by infrastructure administrators. In future, DPS will provide extensive add-on services that can be used by the data scientists and business analysts in a self-service manner.

It Takes a Village!

While unforeseen disasters can be instant, we have certainly realized that it takes a village to get an enterprise grade Hybrid Cloud based Business Continuity/Disaster Recovery solution, and it was truly a cross-functional collaboration between various functions, including Product Management, Engineering, User Experience, Documentation, Sales, Support etc. In the process, we also established new best practices–our user interface was designed before we wrote a single line of code. We ran a beta program with our customers and received valuable feedback. And, this is our first service offering from Hortonworks. Last, but not least, Data Lifecycle Manager service is powered by Open Source and we are giving back to the Apache Community!

How Do You Get Started

Please check out the following resources to learn more about the product and to get started:
Hortonworks DataPlane Service & Data Lifecycle Manager Webpage: https://hortonworks.com/products/data-management/dataplane-service/
Product Documentation: https://docs.hortonworks.com/HDPDocuments/DPS1/DPS-1.0.0/index.html

If you want a demonstration of Hortonworks DPS, please reach out to the Hortonworks account team!

Comments

  • Leave a Reply

    Your email address will not be published. Required fields are marked *