Home Forums HDP on Linux – Installation Is it possible to STOP and START a HDP cluster running on Amazon’s EC2?

This topic contains 6 replies, has 3 voices, and was last updated by  rajeev kaul 1 year, 3 months ago.

  • Creator
    Topic
  • #13665

    rajeev kaul
    Participant

    Is it possible to STOP and START a HDP cluster running on Amazon’s EC2? The reason I am asking, is that IP’s of the nodes in the cluster change everytime they are re-started. One possible way may be to use AWS Route53 DNS service to map DNS name to the node’s internal IP address. During the boot process we can create an A record that maps the DNS name to the new internal IP address. The HMC installation will be given the DNS names instead of the internal ip addresses. But is this enough? Do we need to do anything else to configurations of different components like Puppet, Nagios, Ganglia, etc. Has anyone tried this successfully? Is it even possible?

Viewing 6 replies - 1 through 6 (of 6 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #13881

    rajeev kaul
    Participant

    Sorry, I decided to abandon working with HDP 1.1 (HMC). and try this with the new version – HDP 1.2 (Ambari). I will let you know what issues I face with it.

    Collapse
    #13717

    Larry Liu
    Moderator

    Hi, Rajeev,

    Can you please provide the hive.log from /var/log/hive, hive-site.xml from /etc/hive/conf?

    Thanks

    Larry

    Collapse
    #13716

    rajeev kaul
    Participant

    Hi Larry,

    Yes, mysql is running and I can login using the hive credentials setup during initial installation.

    Rajeev

    Collapse
    #13715

    Larry Liu
    Moderator

    Hi, Rajeev

    Are you running mysql? Please make sure mysql is running.

    Hope this helps

    Larry

    Collapse
    #13711

    rajeev kaul
    Participant

    I made an attempt to try this, and it almost worked.

    On restart, I used HMC to start all the services, which it did until it reached Hive/Hcatalog tests. It failed at that point. The hmc.log contained the following at the end:

    [2013:01:17 01:20:35][INFO][OrchestratorDB][OrchestratorDB.php:610][persistTransaction]: persist: 24-42-0:STARTED:Hive/HCatalog start:COMPLETED
    [2013:01:17 01:20:35][INFO][OrchestratorDB][OrchestratorDB.php:556][setServiceState]: HIVE – STARTED
    [2013:01:17 01:20:35][INFO][Service: HIVE (cciHadoop)][Service.php:130][setState]: HIVE – STARTED dryRun=
    [2013:01:17 01:20:35][INFO][OrchestratorDB][OrchestratorDB.php:610][persistTransaction]: persist: 24-48-0:IN_PROGRESS:Hive/HCatalog test:IN_PROGRESS

    [2013:01:17 01:20:57][INFO][PuppetInvoker][PuppetInvoker.php:237][createGenKickWaitResponse]: Response of genKickWait:
    Array
    (
    [result] => 0
    [error] =>
    [nokick] => Array
    (
    )

    [failed] => Array
    (
    [0] => hwks2.ec2.customercaresolutions.com
    )

    [success] => Array
    (
    )

    [timedoutnodes] => Array
    (
    )

    )

    [2013:01:17 01:20:57][INFO][Service: HIVE (cciHadoop)][Service.php:466][smoke]: Persisting puppet report for smoke testing HIVE
    [2013:01:17 01:20:57][ERROR][Service: HIVE (cciHadoop)][Service.php:473][smoke]: Service smoke check failed with Array
    (
    [result] => 0
    [error] =>
    [nokick] => Array
    (
    )

    [failed] => Array
    (
    [0] => hwks2.ec2.customercaresolutions.com
    )

    [success] => Array
    (
    )

    [timedoutnodes] => Array
    (
    )

    )

    [2013:01:17 01:20:57][INFO][OrchestratorDB][OrchestratorDB.php:610][persistTransaction]: persist: 24-48-0:FAILED:Hive/HCatalog test:FAILED
    [2013:01:17 01:20:57][INFO][OrchestratorDB][OrchestratorDB.php:556][setServiceState]: HIVE – FAILED
    [2013:01:17 01:20:57][INFO][Service: HIVE (cciHadoop)][Service.php:130][setState]: HIVE – FAILED dryRun=
    [2013:01:17 01:20:57][INFO][OrchestratorDB][OrchestratorDB.php:610][persistTransaction]: persist: 24-48-0:FAILED:Hive/HCatalog test:FAILED
    [2013:01:17 01:20:57][INFO][Cluster:cciHadoop][Cluster.php:810][startService]: Starting service HIVE complete. Result=-2
    [2013:01:17 01:20:57][INFO][ClusterMain:TxnId=24][ClusterMain.php:353][]: Completed action=startAll on cluster=cciHadoop, txn=24-0-0, result=-2, error=Service HIVE is not STARTED, smoke tests failed!

    Tried to stop and restart the services a few more times but with same results.

    Collapse
    #13710

    tedr
    Member

    Hi Rajeev,

    As long as the names resolve correctly when you are done with what you talk about in your post, then that should be enough. Since the various hadoop services and Nagios and Ganglia check by host name, not IP, they should work fine.

    I hope this helps,
    Ted.

    Collapse
Viewing 6 replies - 1 through 6 (of 6 total)