We’re looking at backup methods for our hadoop. We found out quickly that disctp will not work to copy data between our clusters because the Name nodes have been configured on networks that are private to each cluster. Therefore the datanodes from our primary cluster can not talk to the namenodes on the target cluster.
However, our edge nodes for our two clusters have been configured so that they can, in fact, talk to both private networks. The network designers have placed nice 1G pipes on the edgenode hosts for the purpose of allowing the edge nodes to move data in and out of the clusters.
The network team is of the opinion that Falcon can extract data from the datanodes in cluster through the single edgenode and transfer the data to the target cluster. My understanding is that Falcon is built on top of distcp and, if distcp can’t move the data, then Falcon can’t move the data.
Who is correct? Can Falcon funnel all the data in a transfer through a single edge node like they believe? [ I realize that, even if possible, it presents a bottleneck but that’s a different issue right now; the question is can it be done?]