The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

HDP on Linux – Installation Forum

How to fix HMC node only without reinstlling a new cluster?

  • #13179

    Hi all :

    My HMC node failed with “JSON Parse failed” , it is caused by AC 110V power suddenly lost last night.
    so I reboot my hadoop cluster (5 nodes)this morning,and hope it can come to live again.
    Unfortunately I found my HMC web page with “JSON Parse failed”
    I have no idea about how to fix ” JSON Parse failed” ,so I chose to “remove hmc and install hmc” then,

    After
    ” # yum remove hmc ;
    #yum install hmc ;
    #service hmc start ,”
    I found my HMC web page went to Hortonworks Welcome page
    “Welcome to Hortonworks Management Center ” with a button labeled “get started”
    that is a sign of install new cluster.
    I did not mean to reinstall all the cluster , instead, I just want to fix my HMC due to my HMC got stuck with “JSON Parse failed” that is caused by suddenly power loast last night.

    Question 1:
    Anybody know how to safely recover my HMC without having to reinstall a new cluster?
    Any method to fix HMC without reinstalling the hadoop cluster if HMC fail again with some cause in the future?

    Regards,

    Jeff

  • Author
    Replies
  • #13187
    Larry Liu
    Moderator

    Hi, Jeff

    Thanks for trying HMC.

    Please try the following workaround:

    On your HMC installed box, run:
    sqlite3 /var/db/hmc/data/data.db

    From sqlite prompt, issue the following commands:
    update hostroles set state=’STOPPED’ where state=’STOPPING’;
    update hostroles set state=’STOPPED’ where state=’STARTED’;
    update hostroles set state=’STOPPED’ where state=’FAILED’;
    update serviceinfo set state=’STOPPED’ where state=’STOPPING’;
    update serviceinfo set state=’STOPPED’ where state=’STARTED’;
    update serviceinfo set state=’STOPPED’ where state=’FAILED’;
    update servicecomponentinfo set state=’STOPPED’ where state=’STOPPING’;
    update servicecomponentinfo set state=’STOPPED’ where state=’STARTED’;
    update servicecomponentinfo set state=’STOPPED’ where state=’FAILED’;

    Hope this helps.

    Thanks

    Larry

    #13191

    Hi Larry:

    Thanks for your help ,
    I think your info will be very helpful if I met the same problem in the future,
    but unfortunately I currently encountered the following problem while trying to fix my HMC.

    [root@host011 /]# sqlite3 /var/db/hmc/data/data.db
    SQLite version 3.6.20
    Enter “.help” for instructions
    Enter SQL statements terminated with a “;”
    sqlite> update hostroles set state=’STOPPED’ where state=’STOPPING’;
    Error: no such column: ’STOPPED’
    sqlite> update hostroles set state=’STOPPED’ where state=’STARTED’;
    Error: no such column: ’STOPPED’
    sqlite> update hostroles set state=’STOPPED’ where state=’FAILED’;
    Error: no such column: ’STOPPED’
    sqlite> update serviceinfo set state=’STOPPED’ where state=’STOPPING’;
    Error: no such column: ’STOPPED’
    sqlite> update serviceinfo set state=’STOPPED’ where state=’STARTED’;
    Error: no such column: ’STOPPED’
    sqlite> update serviceinfo set state=’STOPPED’ where state=’FAILED’;
    Error: no such column: ’STOPPED’
    sqlite> update servicecomponentinfo set state=’STOPPED’ where state=’STOPPING’;
    Error: no such column: ’STOPPED’
    sqlite> update servicecomponentinfo set state=’STOPPED’ where state=’STARTED’;
    Error: no such column: ’STOPPED’
    sqlite> update servicecomponentinfo set state=’STOPPED’ where state=’FAILED’;
    Error: no such column: ’STOPPED’

    It seemed that the original database /var/db/hmc/data/data.db had been removed by me yesterday because I had done the following steps :
    ” # yum remove hmc ;
    #yum install hmc ;
    #service hmc start ,”
    yesterday.

    Now What else should I do ?

    Is the only way “Reinstall a new cluster” ?
    ===========================================
    database info
    ============================================
    [root@host011 /]# sqlite3 /var/db/hmc/data/data.db “.table”
    Clusters ServiceComponents
    ConfigHistory ServiceConfig
    ConfigProperties ServiceDependencies
    HostRoleConfig ServiceInfo
    HostRoles Services
    Hosts SubTransactionStatus
    ServiceComponentDependencies TransactionStatus
    ServiceComponentInfo
    [root@host011 /]# sqlite3 /var/db/hmc/data/data.db “PRAGMA table_info(Hostroles)”
    0|role_id|INTEGER|0||1
    1|cluster_name|TEXT|0||0
    2|host_name|TEXT|0||0
    3|component_name|TEXT|0||0
    4|state|TEXT|0||0
    5|desired_state|TEXT|0||0

    Regards,
    Jeff

    #13195
    Larry Liu
    Moderator

    Hi, Jeff

    I think it is a good idea to reinstall the HMC to have a clean installation.

    Larry

    #13209

    Dear Larry:

    Thanks! I’ll install a new cluster.

    Jeff

    #13227
    tedr
    Member

    Hi Jeff,

    How did the re-install of the cluster go?

    Thanks,
    Ted.

    #13237

    Hi tedr:

    Thanks for your help.
    After the re-install cluster, I met a new problem,
    See as the following description:
    (1) Reinstall Centos 6.3 with 10 nodes
    (2) prepare the BIT
    (3) service HMC start
    (4)deploy cluster
    3 master nodes
    host011 NameNode ok,
    host012 SNN ok+Jobtracker ok
    host013 HBase ok
    7 slave nodes:
    host001 :HMC ok+Tasktracker ok+DataNode ok +RegionServer ok
    host002:Tasktracker ok+DataNode ok +RegionServer ok
    host003:Tasktracker ok+DataNode ok +RegionServer ok
    host004:Tasktracker ok+DataNode ok +RegionServer ok
    host006:Tasktracker ok+RegionServer ok
    host011:Tasktracker ok+DataNode +RegionServer ok
    host012:Tasktracker ok+DataNode ok +RegionServer ok

    if you press #jps in the host006, you can only see Tasktracker and
    Regionserver, no DataNode,
    [root@host006 ~]# jps
    32329 HRegionServer
    6938 Jps
    31474 TaskTracker

    7 slaves with 6 datanodes+7 jobtrackers+7 Regionserver
    Suppose there should be 7 datanodes in the 10 nodes
    Why there are only 6 data nodes no mater I un-install and re-install 3 more times
    I doubt that is it only a network issue ? or a HMC bug?

    ps. Nagios aleart
    [01-08-2013 00:03:08] Caught SIGTERM, shutting down…
    Service Warning[01-08-2013 00:02:53] SERVICE ALERT: host011.dmo.com;HDFS::Percent DataNodes down;WARNING;HARD;3;WARNING: total:, affected:
    Service Critical[01-08-2013 00:02:43] SERVICE ALERT: host006.dmo.com;DATANODE::Process down;CRITICAL;SOFT;2;Connection refused
    Service Warning[01-08-2013 00:02:33] SERVICE ALERT: host011.dmo.com;HDFS::Percent DataNodes down;WARNING;SOFT;2;WARNING: total:, affected:
    Service Warning[01-08-2013 00:02:23] SERVICE ALERT: host011.dmo.com;HDFS::Percent DataNodes down;WARNING;SOFT;1;WARNING: total:, affected:
    Service Critical[01-08-2013 00:02:13] SERVICE ALERT: host006.dmo.com;DATANODE::Process down;CRITICAL;SOFT;1;Connection refused
    Program Start[01-08-2013 00:01:03] Nagios 3.2.3 starting… (PID=12062)
    Program End[01-08-2013 00:01:02] Caught SIGTERM, shutting down…
    Program Start[01-08-2013 00:01:01] Nagios 3.2.3 starting… (PID=11985)

    Regards,
    Jeff

    #13250
    tedr
    Member

    Hi Jeff,

    On host006 can you send us the datanode logs? This will enable us to see if the datanode jhas a problem starting up on that server. This log is usually located in the /var/log/hadoop/hdfs directory and will have ‘datanode’ in the filename.

    Thanks,
    Ted.

The forum ‘HDP on Linux – Installation’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.