Home Forums HDP on Linux – Installation Starting HDFS Error during HMC cluster installatoin

This topic contains 10 replies, has 4 voices, and was last updated by  Seth Lyubich 1 year, 9 months ago.

  • Creator
    Topic
  • #13181

    Trang Nguyen
    Member

    Hi,

    I’m running into an HDFS failure starting HDFS during HMC cluster installation:
    2013-01-03 03:45:59,916 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 0 Total time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0
    2013-01-03 03:45:59,923 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: closing edit log: position=4, editlog=/hadoopdata/hadoop/hdfs/namenode/current/edits
    2013-01-03 03:45:59,923 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: close success: truncate to 0, editlog=/hadoopdata/hadoop/hdfs/namenode/current/edits
    2013-01-03 03:45:59,924 ERROR org.apache.hadoop.hdfs.server.common.Storage: error retrying to reopen storage directory ‘/hadoopdata/hadoop/hdfs/namenode’
    java.io.FileNotFoundException: /hadoopdata/hadoop/hdfs/namenode/current/edits.new (No such file or directory)

    013-01-03 04:28:55,416 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
    java.io.IOException: NameNode is not formatted.
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:331)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:104)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:395)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:369)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:473)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1256)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1265)

    Any help would be appreciated.

    Thanks,
    Trang

Viewing 10 replies - 1 through 10 (of 10 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #13206

    Seth Lyubich
    Keymaster

    Hi Trang,

    There is no way to uninstall just hive by itself. I think at this point it is better to let HMC to install and configure MySQL. Can you please try to remove MySQL and rerun the installer? Please try following:

    yum erase hmc puppet
    yum erase mysql
    rerun the installation per documentation

    Once you get to cluster setup url, please don’t put anything for mysql location. Please provide only password. This configuration should allow HMC to install and configure HMC by itself.

    Hope this helps.
    Please let us know if you are able to complete your installation.

    Thanks,
    Seth

    Collapse
    #13203

    Trang Nguyen
    Member

    I was able to get through the above issues by reinstalling puppet on the master and all slave nodes.
    I got as far as the hive installation. It failed because mysql dependencies could not be successfully installed on centos6. I got around this by manually installing MySQL:

    yum install mysql-server
    yum install MySQL-client
    service mysql start

    I verified that I could login into mysql as root.

    I also manually copied the mysql-java-connector on the hive server:
    cp ./mysql-connector-java-5.1.18/mysql-connector-java-5.1.18-bin.jar /usr/lib/hive/lib/.
    chmod 644 /usr/lib/hive/lib/*mysql*.jar

    However, when I login as the “hive” user and attempt to start services, I get the following error:
    su hive
    /usr/lib/hive/bin/hive –service metastore
    Error:

    Caused by: org.datanucleus.store.rdbms.datasource.DatastoreDriverNotFoundException: The specified datastore driver (“com.mysql.jdbc.Driver”) was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
    at org.datanucleus.store.rdbms.datasource.AbstractDataSourceFactory.loadDriver(AbstractDataSourceFactory.java:57)
    at org.datanucleus.store.rdbms.datasource.DBCPDataSourceFactory.makePooledDataSource(DBCPDataSourceFactory.java:54)
    at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:182)
    … 52 more

    Also, is there a way to uninstall just the hive installation instead of the entire cluster and restart hmc?
    Each time things fail, I’ve found I’ve had to reinstall hmc – otherwise it attempts to reinstall the entire cluster.

    Thanks,
    Trang

    Collapse
    #13201

    Trang Nguyen
    Member

    Hi Ted,

    I gave up, uninstalled the cluster, removed all the config files and reinstalled HMC and puppet on the master.
    However, now it seems to fail during the HDFS test. When I check the logs, I see that after the cluster installation, HMC had issues reaching the slave nodes:
    :01:04 20:25:56][INFO][Cluster:SmsHadoopCluster][Cluster.php:794][installService]: Installing service NAGIOS complete.
    [2013:01:04 20:25:56][INFO][Cluster:SmsHadoopCluster][Cluster.php:792][installService]: Installing service MISCELLANEOUS …
    [2013:01:04 20:25:56][INFO][OrchestratorDB][OrchestratorDB.php:556][setServiceState]: MISCELLANEOUS – INSTALLING
    [2013:01:04 20:25:56][INFO][Service: MISCELLANEOUS (SmsHadoopCluster)][Service.php:130][setState]: MISCELLANEOUS – INSTALLING dryRun=
    [2013:01:04 20:25:56][INFO][OrchestratorDB][OrchestratorDB.php:556][setServiceState]: MISCELLANEOUS – INSTALLED
    [2013:01:04 20:25:56][INFO][Service: MISCELLANEOUS (SmsHadoopCluster)][Service.php:130][setState]: MISCELLANEOUS – INSTALLED dryRun=
    [2013:01:04 20:25:56][INFO][Cluster:SmsHadoopCluster][Cluster.php:794][installService]: Installing service MISCELLANEOUS complete.
    [2013:01:04 20:25:56][INFO][PuppetInvoker][PuppetInvoker.php:277][genKickWait]: rm -f /etc/puppet/master/modules/catalog/files/modules.tgz
    [2013:01:04 20:25:56][INFO][PuppetInvoker][PuppetInvoker.php:280][genKickWait]: tar zcf /etc/puppet/master/manifestloader/modules.tgz /etc/puppet/master/modules
    [2013:01:04 20:25:56][INFO][PuppetInvoker][PuppetInvoker.php:283][genKickWait]: mv /etc/puppet/master/manifestloader/modules.tgz /etc/puppet/master/modules/catalog/files
    [2013:01:04 20:25:56][INFO][PuppetInvoker][PuppetInvoker.php:292][genKickWait]: Kick attempt (1/3)
    [2013:01:04 20:25:56][INFO][PuppetInvoker][PuppetInvoker.php:332][waitForResults]: Waiting for results from xdc-tst-mapre-001.openmarket.com,xdc-tst-mapre-004.openmarket.com,xdc-tst-mapre-006.openmarket.com,xdc-tst-mapre-003.openmarket.com,xdc-tst-mapre-002.openmarket.com,xdc-tst-mapre-005.openmarket.com
    [2013:01:04 20:25:56][INFO][PuppetInvoker][PuppetInvoker.php:336][waitForResults]: 0 out of 6 nodes have reported for txn 3-2-0

    However, when I check the nodes, I do see that HDFS, master and secondary namenodes were successfully started:
    fs 27802 1 0 15:10 ? 00:00:04 /usr/jdk64/jdk1.6.0_31/bin/java -Dproc_secondarynamenode -Xmx1000m -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop

    HMC stops upon trying to start HDFS.
    Any advice on how to proceed?

    Thanks,
    Trang

    Collapse
    #13197

    tedr
    Member

    Hi Trang,

    You should also check the permissions on that folder. It must be accesible (readable and writable) by the ‘hdfs’ user.

    Thanks,
    Ted.

    Collapse
    #13196

    Larry Liu
    Moderator

    Hi, Trang

    Since you have done several HMC installations, can you please check /etc/hadoop/conf to make sure the configuration is as expected based on your installation?

    In core-site.xml, you can find the location of secondary namenode directory.

    If the configuration file mess up, you can try to delete all the configuration and start a new install.

    Hope this helps.

    Thanks

    Larry

    Collapse
    #13190

    Trang Nguyen
    Member

    The logs on the secondary indicates that it is using the wrong folder:
    013-01-03 19:51:56,287 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /hadoopInstall/hadoop/hdfs/namesecondary does not exist.
    2013-01-03 19:51:56,287 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint:
    2013-01-03 19:51:56,287 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /hadoopInstall/hadoop/hdfs/namesecondary is in an inconsistent state: checkpoint directory does not exist or is not accessible.
    at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.recoverCreate(SecondaryNameNode.java:619)
    at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.startCheckpoint(SecondaryNameNode.java:435)
    at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:398)
    at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:311)
    at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:274)
    at java.lang.Thread.run(Thread.java:662)

    The actual folder directory should be: /hadoopdata/hadoop (I had done several HMC installs and at one point switched to a different folder).

    [root@xdc-tst-mapre-002 hadoop]# ls
    hdfs mapred zookeeper

    Is there a way to fix this?

    Thanks,
    Trang

    Collapse
    #13189

    Larry Liu
    Moderator

    Hi, Trang

    Can you please also provide the file list for secondary namenode current folder?

    Thanks

    Larry

    Collapse
    #13188

    Trang Nguyen
    Member

    Hi Larry,

    I was able to get around the issue by doing these steps:
    su hdfs
    hadoop namenode -format
    exit
    yum erase hmc puppet
    yum install hmc
    service hmc start

    However, now HMC is reporting an error during the testing of the HBase instance.

    Logs on hbase master:

    -01-03 13:17:26,837 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/backup-masters/xdc-tst-mapre-001.openmarket.com,60000,1357237046481 already deleted, and this is not a retry
    2013-01-03 13:17:26,837 INFO org.apache.hadoop.hbase.master.ActiveMasterManager: Master=xdc-tst-mapre-001.openmarket.com,60000,1357237046481
    2013-01-03 13:17:28,027 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: xdc-tst-mapre-001.openmarket.com/10.9.197.68:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
    2013-01-03 13:17:29,028 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: xdc-tst-mapre-001.openmarket.com/10.9.197.68:8020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
    2013-01-03 13:17:30,029 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: xdc-tst-mapre-001.openmarket.com/10.9.197.68:8020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
    2013-01-03 13:17:31,029 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: xdc-tst-mapre-001.openmarket.com/10.9.197.68:8020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
    2013-01-03 13:17:32,030 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: xdc-tst-mapre-001.openmarket.com/10.9.197.68:8020. Already tried 4 time(s

    013-01-03 13:18:17,059 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
    java.net.ConnectException: Call to xdc-tst-mapre-001.openmarket.com/10.9.197.68:8020 failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:1121)
    at org.apache.hadoop.ipc.Client.call(Client.java:1097)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
    at $Proxy10.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:411)
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:120)
    at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:258)
    at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:223)

    It looks like the hbase master is no longer running so I had to manually start it:

    /usr/lib/hbase/bin/hbase-daemon.sh –config /etc/hbase/conf/ start master

    Is there anything else that can be done or logs I could send you? Could I somehow restart hmc without having to uninstall?

    Thanks,
    Trang

    Collapse
    #13186

    Larry Liu
    Moderator

    Hi, Trang,

    Thanks for trying HDP.

    From the error log, I see the following error:

    2013-01-03 03:45:59,924 ERROR org.apache.hadoop.hdfs.server.common.Storage: error retrying to reopen storage directory ‘/hadoopdata/hadoop/hdfs/namenode’
    java.io.FileNotFoundException: /hadoopdata/hadoop/hdfs/namenode/current/edits.new (No such file or directory)

    Can you please provide the file list for the directory /hadoopdata/hadoop/hdfs/namenode/current/ and also upload the configuration /etc/hadoop/conf to the out ftp server?

    http://hortonworks.com/community/forums/topic/hmc-installation-support-help-us-help-you/

    Thanks

    Larry

    Collapse
    #13182

    Trang Nguyen
    Member

    Adding some more context. This is occurring on the master node:

    013-01-03 03:40:43,715 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/10.9.197.70:50010
    2013-01-03 03:40:43,715 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/10.9.197.70:50010
    2013-01-03 03:40:43,717 INFO org.apache.hadoop.hdfs.StateChange: *BLOCK* NameNode.blocksBeingWrittenReport: from 10.9.197.70:50010 0 blocks
    2013-01-03 03:41:04,729 INFO org.apache.hadoop.hdfs.StateChange: *BLOCK* NameSystem.processReport: from 10.9.197.70:50010, blocks: 0, processing time: 0 msecs
    2013-01-03 03:42:43,508 INFO org.apache.hadoop.hdfs.StateChange: *BLOCK* NameSystem.processReport: from 10.9.197.69:50010, blocks: 0, processing time: 0 msecs
    2013-01-03 03:45:59,916 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 10.9.197.67
    2013-01-03 03:45:59,916 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 0 Total time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0
    2013-01-03 03:45:59,923 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: closing edit log: position=4, editlog=/hadoopdata/hadoop/hdfs/namenode/current/edits
    2013-01-03 03:45:59,923 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: close success: truncate to 0, editlog=/hadoopdata/hadoop/hdfs/namenode/current/edits
    2013-01-03 03:45:59,924 ERROR org.apache.hadoop.hdfs.server.common.Storage: error retrying to reopen storage directory ‘/hadoopdata/hadoop/hdfs/namenode’
    java.io.FileNotFoundException: /hadoopdata/hadoop/hdfs/namenode/current/edits.new (No such file or directory)
    at java.io.RandomAccessFile.open(Native Method)
    at java.io.RandomAccessFile.(RandomAccessFile.java:216)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLog$EditLogFileOutputStream.(FSEditLog.java:152)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1423)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1608)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:5211)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.rollEditLog(NameNode.java:885)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    Collapse
Viewing 10 replies - 1 through 10 (of 10 total)