HDFS Forum

Adding new data volumes

  • #25150

    Hi,

    I’m trying to add some disk space to one of our nodes. We currently have a heterogeneous setup with one data directory configured on each node. I was able to aquire some extra disks for one of the nodes, arranged in a JBOD format. I’m not able to get any for our other nodes at the moment.

    As I understand it, Hadoop should support heterogeneous disk layouts. I understand it’s not optimal, but it’s the best I can do for now.

    The official Hadoop documentation for version 1.1.2 states that the variable dfs.data.dir “Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored.”

    Since diretories that don’t exist should be ignored, I decided to change dfs.data.dir to the existing directory plus the four new ones, plus five that don’t exist yet for future expansion. I did this through the REST API since it’s not possible to change via Ambari directly. I also changed the “DataNode volumes failure toleration” to 9 so that I wouldn’t get startup errors.

    The problem is that now I get the following error on all the datanodes, which fail to start:

    2013-05-10 13:48:57,650 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: org.apache.hadoop.util.DiskChecker$DiskErrorException: Invalid value for validVolsRequired : -8 , Current valid volumes: 1
    at org.apache.hadoop.hdfs.server.datanode.FSDataset.(FSDataset.java:982)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:407)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:313)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1674)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1613)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1631)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1757)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1774)

    Is it even possible to add additional data volumes to a single node? What would be the recommended solution to this, assuming that I cannot acquire extra hard drives for some time?

    Thank you,
    John

to create new topics or reply. | New User Registration

  • Author
    Replies
  • #26433
    tedr
    Member

    Hi John,

    Unfortunately, Ambari does not currently support differing volumes on data nodes, this feature is planeed for a future release.

    Thanks,
    Ted.

The topic ‘Adding new data volumes’ is closed to new replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.