Home Forums HDFS Need Redundancy, Not Big Data

This topic contains 7 replies, has 2 voices, and was last updated by  Jeff Bowman 8 months, 4 weeks ago.

  • Creator
    Topic
  • #46244

    Jeff Bowman
    Participant

    I have a customer on whose behalf I’m investigating Hadoop on Windows. He’s running Windows Server 2012 Essentials Edition in a Hyper-V VM. His is a small company, with fewer than 25 workstations and less than 4TB of storage requirements.

    However, redundant and reliable offsite backup is very important.

    I’m wondering whether HDFS can fill this need. Is it possible to set up a few remote machines and configure them as nodes, and then be able to remove any one of them at any time without impacting the data store as a whole?

    Thanks,
    Jeff Bowman
    Fairbanks, Alaska

Viewing 7 replies - 1 through 7 (of 7 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #49278

    Jeff Bowman
    Participant

    Hi Robert

    This is very, very good news. Thank you so much.

    Thanks,
    Jeff Bowman
    Fairbanks, Alaska

    Collapse
    #49270

    Robert Molina
    Moderator

    Hi Jeff,
    Correct, as long as the cluster is showing healthy, and you have cluster that accommodate the default 3 replicas, the data should be fine. But keep in mind, you still have to be aware of what is being done on the cluster. For instance, one can manually over-ride replication and set replication factor to 1 for a file. If a node goes down, and that file’s block was only on that one node, the file would not be accessible.

    Regards,
    Robert

    Collapse
    #48088

    Jeff Bowman
    Participant

    Is my understanding correct?

    Thanks,
    Jeff Bowman
    Fairbanks, Alaska

    Collapse
    #47229

    Jeff Bowman
    Participant

    Hi Robert

    This sounds like good news.

    OK then, just to recap: we can have a node fail unexpectedly–such as with a hard drive crash–and then simply replace it, being confident that no files were/will be lost.

    Thanks,
    Jeff Bowman
    Fairbanks, Alaska

    Collapse
    #47214

    Robert Molina
    Moderator

    Hi Jeff,
    If there are under repcliated blocks, hdfs should automatically add replicas for it. The same goes if there is an excess replicas, HDFS will try to remove the excess. Thus, yes if the node fails, you can take it off line. Decomissioning is the elegant way of removing the node from the cluster.

    Regards,
    Robert

    Collapse
    #46558

    Jeff Bowman
    Participant

    Hi Robert

    Thanks for this–it helps. However, we’re not quite all the way there yet.

    I could be mistaken, but I thought one of the features of HDFS (being based on Google’s FS) was fail-safe redundancy. That in the event of a machine failure, the failed node could simply be taken offline and replaced. The FS would then rebuild itself back to its previous hardened state.

    Sort of a RAID6 across the WAN, if you will.

    Am I misunderstanding this part of it?

    Thanks,
    Jeff Bowman
    Fairbanks, Alaska

    Collapse
    #46356

    Robert Molina
    Moderator

    Hi Jeff,
    HDFS should be able to fulfill the need of being redundant and reliable storage to backup your data. Yes it is possible to setup a few machines and configure them as nodes and then remove them at a later time. There is a decomissioning feature which allows to remove a node from the cluster, which distributes its blocks to other nodes before it fully can be removed from the cluster to ensure data integrity and redundancy.

    Regards,
    Robert

    Collapse
Viewing 7 replies - 1 through 7 (of 7 total)