HDFS Forum

Setting up multiple hadoop clusters

  • #47014
    Rajiv Gupta
    Participant

    Hi,
    In our organization, we have hadoop cluster with 13 data nodes and replication factor-2. But we have been facing issue wrt HDD for last two months. There are cases where 3-4 disks are faulty together, leading to data loss.

    Now, we want to set-up separate geographical cluster. Need to know how to synchronize data between these two clusters. Connectivity would not be an issue. But incremental data need to be copied on other cluster.

    Pls suggest.

    Regards,
    Rajiv Gupta

to create new topics or reply. | New User Registration

  • Author
    Replies
  • #47166
    Rajiv Gupta
    Participant

    pls update

    #47195
    Robert Molina
    Moderator

    Hi Rajiv,
    Have you looked into the Distributed Copy aka distcp feature within hdfs? You can use the command to copy data from one cluster to another.

    Hope that helps.

    Regards,
    Robert

    #47242
    Rajiv Gupta
    Participant

    Hi Robert

    Thanks a lot for replying. Yes, we have seen that option, just wanted to see if there is any other better alternative or solution apart from distcp. Is there any limitation for this command to work?

    Regards
    Rajiv

    #49263
    Robert Molina
    Moderator

    Hi Rajiv,
    Sorry about the late response. Distcp would be a feature to use, there are integration tools such as talend. As far as your specific use case where the clusters are distant from each other physically, theoretically it should work. The hortonworks services team may have done something similar in the past. Please feel free to reach out to them if you are still need assistance in designing this use case.

    http://hortonworks.com/contact/

    Regards,
    Robert

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.