Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
June 06, 2018
prev slideNext slide

Data Replication in Hadoop

In October 2017, Hortonworks launched DataPlane Service, a portfolio of solutions that enables businesses to manage, secure and govern their data spread across on-premise and in-cloud data lakes. Data Lifecycle Manager (DLM) was the first extensible service to be built on the DataPlane platform.

We recently sat down with Niru Anisetti, Principal Product Manager to talk about DLM. Prior to joining Hortonworks, Niru was Program Director in the product management team for Spark services at IBM.

Traditionally, Apache Hadoop has been associated with data storage and compute. Do you think there is an awareness of data replication in the Hadoop space?
You are absolutely right that Apache Hadoop was associated with big data storage and batch compute in the past and is still true in most cases. What has significantly changed is where and how the data is consumed. DLM helps customers by moving the data where their business applications run whether it is in the cloud or in a specific data center in EU region to comply with GDPR regulation.

Data replication, backup and restore are fairly mature technologies. Why has Hortonworks decided to enter this market?
As noted in Gartner’s[i] July 2017 Magic Quadrant for Disaster Recovery as a Service report, “…  DRaaS is now a mainstream offering.” Gartner estimated “it to be a $2.02 billion business currently, and it is expected to reach $3.73 billion by 2021.” There are more than 500 Disaster Recovery service providers offering a wide range of options from fully managed services to enablement of customer self-service models. Hortonworks decided to enter this market in 2016 to fulfill the unique needs of the big data market and to enable customers to choose the optimal solution for their business use cases. Choosing a DR solution that doesn’t scale can not only lead to increase in storage costs but jeopardize the overall operation of a company.

What are some of the industry drivers that make you excited about the future of DLM?

We are excited about the growth that the big data market continues to experience and the impact it has had on our customer’s businesses.  The worldwide Big Data market revenues for software and services are projected to increase from $42B in 2018 to $103B in 2027, attaining a CAGR of 10.48%, according to estimates by Wikibon. The revenue predicted from software applications in 2018 alone is $337B according to Forrester[ii]. Furthermore, nearly 50% of the respondents to a recent McKinsey analytics survey said that analytics and big data have fundamentally changed the practices in their business functions.

We believe that there is a large market for DR solutions and given the industry data on data movement software, DLM has set out to position itself around disaster recovery, backup and restore and tiering.­­

You just launched DLM 1.1. What are some of the features that you think will resonate with customers and prospects?
DLM 1.1 was recently launched and we believe there is a strong demand for a solution like DLM among our customers. More than 35% of Hortonworks customer base was looking to transition to cloud workloads. They need a solution like DLM to enable data movement from on-premises to the cloud so that they can run business analytical applications on the datasets where it makes sense. DLM 1.1 provides a complete solution where you can move data, metadata, and security policies to cloud. To add a perspective, DLM is the only product in the market that provides the ability to not only copy data but also metadata and security policies to the target cluster. DLM also supports data movement for data-at-rest and data-in-motion-whether the data is encrypted using a single key or multiple keys on both source and target clusters.

In terms of future enhancements and functionality, where do you want to take DLM and what should we keep an eye on?
DLM team is committed to solving the customer problems they face in the area of disaster recovery so that enterprises can safeguard their data and have continuity of operation. We plan to add functionality to support tiering so that customers can move their data to different storage media to optimize performance, reduce total cost of ownership and abide by compliance policies such as GDPR. We are considering other workloads as part of DLM in addition to supporting multi-cloud storage requirements in the future. What is great about DLM is that Apache Hadoop community can participate in the development of these features as we released DLM as an AGPL licensed product in May 2018.


To learn more about DLM, please visit Hortonworks Data Lifecycle Manager



[i] Gartner, Magic Quadrant for Disaster Recovery. Published: 19 June 2017 ID: G00311593

[ii] Forrester, The Global Tech Market Outlook for 2018 To 2019. Published: 5 January 2018

Leave a Reply

Your email address will not be published. Required fields are marked *