Home Forums HDP on Linux – Installation hmc fails after restart

This topic contains 26 replies, has 6 voices, and was last updated by  runeetv 2 years, 2 months ago.

  • Creator
    Topic
  • #6605

    Dario Rexin
    Member

    Hi,

    I installed HDP in VirtualBox on CentOS 5.8 and after every restart of the VM when trying to start hmc I get the following error:

    ———————
    (13)Permission denied: make_sock: could not bind to address [::]:8140
    (13)Permission denied: make_sock: could not bind to address 0.0.0.0:8140
    no listening sockets available, shutting down
    Unable to open logs
    ———————

    After yum erase hmc puppet and yum install hmc I can start it again, but I have to do the whole setup again. Is there anything I can do about that?

    Cheers,

    Dario

Viewing 26 replies - 1 through 26 (of 26 total)

The topic ‘hmc fails after restart’ is closed to new replies.

  • Author
    Replies
  • #7622

    runeetv
    Member

    if you are having problems with HMC after restarting / rebooting please follow this thread:

    manually restarting HMC

    Collapse
    #7509

    Sasha J
    Moderator

    James,
    I understand your personal laptop issue. However, instead of shutting down your VM you can simply suspend or pause it. In this case it will not take any of the hosts’s resources and will continue from the same point after resuming.

    Thank you!
    Sasha

    Collapse
    #7508

    This hmc-agent thing was apparently my problem here. That prevents the puppet interactions that are needed.

    I am playing with the environment to learn how much I can rely on it — it is running on my personal laptop and I only want to have the test instance up when I am working with it. So I need to know how to suspend/stop things so that when I come back later, the environment will be usable.

    If this were in our data center, or hosted on EC2, I would of course not be doing the stops/suspends/resumes/starts — but this is my main day-to-day computing resource and I don’t want to leave resources consumed by VMs when not being used.

    Jim

    Collapse
    #7507

    Sasha J
    Moderator

    James,
    I will file documentation enhancement in regarding hmc and hmc-agent.
    This should not be a problem in the upcoming release, as we added booth hmc and hmc-agent to system startup process and it both will be started automatically.
    For now, it both should be started manually.
    By the way, why you keep shutting down your VM?

    Is this the part of you test case?

    As we discussed earlier, once deployed, cluster never stopping, unless there is a critical issues…

    Thank you!
    Sasha

    Collapse
    #7505

    OK — I will try this next time I re-boot.

    Where is the bit about hmc-agent documented for future reference? This is the first time I recall this being mentioned, especially that it needs a manual start just like hmc itself.

    Jim

    Collapse
    #7503

    Sasha J
    Moderator

    James,
    I will look at the logs later today.
    However, it seems like you forgot to start hmc-agent process…
    There are 2 services: hmc and hmc-agent, both have to be started.
    Pleae, do:
    service hmc start
    service hmc-agent start

    then connect to hmc UI and start HDP services you need.

    Please, let us know on the result.

    Thank you!
    Sasha

    Collapse
    #7500

    I duplicated this situation and this time I saved the logs and uploaded them to the ftp server at hortonworks. The file name begins with jjsTestCluster.

    It seems after re-boot puppetkick fails. Maybe you can explain why from the logs.

    Collapse
    #7485

    Continuing my experiments. I did manage to stop all of the HDP services, stopped hmc, re-booted the machine, restarted hmc, and then tried to restart the services. The first service fails start within 30 seconds and the only message in the operations log is that puppet kick failed for localhost.localdomain.

    Both selinux and the firewall are disabled.

    Looks like I have to start from scratch again…

    Maybe I should just wait for the next release?

    Collapse
    #7484

    Understood.

    But I was just in the process of doing more testing and it just took OVER 30 minutes to shut down all of the services. I was watching system activity using the CentOS system monitor and for a large number if these minutes, the CPU was under 10% utilization and most of the running processes were reported as sleeping. There were literally minutes going by, then a small spike of activity, and more sleeping.

    I can see being careful with starts, but it ought to be possible to stop things much faster.

    Jim

    Collapse
    #7476

    Sasha J
    Moderator

    James,
    again , as we spoke some time ago, restarting services is NOT a common situation and never happens in real life, unless you have physical troubles with the nodes (hardware or networking).
    There will be definitely improvements in the future releases, but as of today, HMC run extensive testing after starting each service, to make sure everything running OK.

    Thank you!
    Sasha

    Collapse
    #7463

    OK — I must have missed the SELINUX advice in the docs. Thanks for including it here. Perhaps the docs could include advice for HOW to disable SELINUX at the outset since CentOS has this enabled from the start.

    Also, happy to learn that the hmc console may be reporting incorrect status — known bug and I should just restart the services regardless of what is reported.

    Now, it restarting the services were faster, all of this would be more tolerable.

    Collapse
    #7458

    Sasha J
    Moderator

    James,
    here is what documentation say on http://hortonworks.com/download/thankyou_hdp1a/, the first page you are on after clicking “HDPv1.0 Install” button:

    Preparing your cluster for the install

    Select your target hosts and disable SELINUX, firewalls, and other security measures. Make sure each host can reach the Internet via HTTP, HTTPS, and FTP. (For local YUM installs, please read the detailed HMC User Guide)
    Prepare passwordless SSH (via authorized_keys) for root user on all target install hosts
    Create a text file with newline separated host entries (one per target host). Make sure to use fully-qualified domain names for each host. Make sure the DNS for each node works on all other nodes (both forward and reverse lookups). If you are not a DNS administrator, consider building up a consistent /etc/hosts file for all target nodes, listing all other nodes in the cluster by a fully-qualified domain name. NOTE: Without fully-qualified domain names, the install may or may not work but Hadoop will not run jobs properly after installation.
    Enable NTP on the cluster – the clocks of all nodes in your cluster must be able to synchronize with each other
    Make certain that all hosts have net-snmp and net-snmp-utils
    Make sure the rpms listed below are either not installed or, if installed, are exactly these versions.
    Ruby 1.8.5-24.el5
    Puppet 2.7.9-2
    Ruby Rack 1.1.0-2.el5
    Passenger 3.0.12-1.el5.centos
    Nagios 3.0.12-1.el5.centos
    Nagios plug-ins 1.4.15-2.el5
    Nagios Common 2.12-10.el5
    MySQL v. 5.*

    For check SELINUX and firewall, use the following:

    [root@rhel58-nn1 ~]# sestatus
    SELinux status: disabled
    [root@rhel58-nn1 ~]# /etc/init.d/iptables status
    Firewall is stopped.
    [root@rhel58-nn1 ~]#

    Incorrect service status reporting is known bug, fixed in next release.
    HMC keep status of the services in the internal database, and when you restart host (VM) database information could not be updated, and shows old status instead of current one.
    Please, ignore this for now and make sure you start all the services yourself after VM restarting.

    Thank you!
    Sasha

    Collapse
    #7456

    The steps about firewall and selinux are NOT documented as far as I know except now in these forums. Unless by firewall you mean the iptables command? If the firewall requirement is different from iptables, How do I make sure this is always off.

    How do I make sure that both are off after a reboot — what commands do I run to tell the status?

    There seem to be times when hmc reports services are started when in fact they are not! That’s what I was seeing after a re-boot and selinux disabled — the console reported the services were all started.

    I will have to see if I can get things to run in a more stable manner by following these additional pointers.

    Jim

    Collapse
    #7450

    Sasha J
    Moderator

    First of all, please follow installation procedures precisely.
    1. make sure firewall is not running on your node(s)
    2. make sure selinux is disabled
    3. make sure it both are not started again after rebooting the box, if it started disable it both for sure and reboot the VM.
    4. install hmc

    In a normal life, once started cluster will never stopped during it’s lifetime.
    In test environments, of course, start and stop may happens many times.
    However, hmc is not designed to start automatically, and it is not designed to start services automatically.
    This done on purpose.
    So, when you restart the box, nothing is started automatically.
    you should start hmc as root, then connect to it’s web interface, and start all services (“StartAll”) or only the services you need. All dependencies will be started automatically.
    services will not be started until you click “start” button on the UI.

    Please, do this and you will have fully running cluster in few minutes.

    Thank you!
    Sasha

    Collapse
    #7436

    Spent some more time on this. I uninstalled the cluster and then re-installed. Installation proceeds until oozie install step and this fails. The deployment log mentions a problem with selinux which I disabled as per this work-around suggestion:

    “nodeLogs”: {
    “localhost.localdomain”: {
    “reportfile”: “/var/lib/puppet/reports/9-95-55/localhost.localdomain”,
    “overall”: “FAILED”,
    “finishtime”: “2012-07-18 23:00:45.745008 -04:00″,
    “message”: [
    "Loaded state in 0.00 seconds",
    "Not using expired catalog for localhost.localdomain from cache; expired at Wed Jul 18 23:00:35 -0400 2012",
    "Using cached catalog",
    "\"catalog supports formats: b64_zlib_yaml dot marshal pson raw yaml; using pson\"",
    "Caching catalog for localhost.localdomain",
    "Creating default schedules",
    "Loaded state in 0.00 seconds",
    "Applying configuration version '9-95-55'",
    "\"requires Exec[puppet_apply]\””,
    “\”requires Exec[untar_modules]\””,
    “Skipping device resources because running on a host”,
    “Skipping device resources because running on a host”,
    “Skipping device resources because running on a host”,
    “Skipping device resources because running on a host”,
    “Skipping device resources because running on a host”,
    “Skipping device resources because running on a host”,
    “\”file_metadata supports formats: b64_zlib_yaml marshal pson raw yaml; using pson\””,
    “\”FileBucket adding {md5}2301d39a18c9446c2df01ac7f2d90c5e\””,
    “Filebucketed /etc/puppet/agent/modules.tgz to puppet with sum 2301d39a18c9446c2df01ac7f2d90c5e”,
    “&id002 \”content changed ‘{md5}2301d39a18c9446c2df01ac7f2d90c5e’ to ‘{md5}82bd9094f4d477c0c7c772198a58ef14′\””,
    “\”The container Class[Manifestloader] will propagate my refresh event\””,
    “Executing ‘rm -rf /etc/puppet/agent/modules ; tar zxf /etc/puppet/agent/modules.tgz -C /etc/puppet/agent/ –strip-components 3′”,
    “Executing ‘rm -rf /etc/puppet/agent/modules ; tar zxf /etc/puppet/agent/modules.tgz -C /etc/puppet/agent/ –strip-components 3′”,
    “&id005 executed successfully”,
    “\”The container Class[Manifestloader] will propagate my refresh event\””,
    “Executing ‘sh /etc/puppet/agent/modules/puppetApply.sh’”,
    “Executing ‘sh /etc/puppet/agent/modules/puppetApply.sh’”,
    “\”Could not retrieve selinux: Invalid argument – /proc/self/attr/current\””,
    “\”Could not retrieve selinux: Invalid argument – /proc/self/attr/current\””,
    “\”Could not retrieve selinux: Invalid argument – /proc/self/attr/current\””,
    “\”Could not retrieve selinux: Invalid argument – /proc/self/attr/current\””,
    “\”Could not retrieve selinux: Invalid argument – /proc/self/attr/current\””,
    “\”Could not retrieve selinux: Invalid argument – /proc/self/attr/current\””,
    “\”Could not retrieve selinux: Invalid argument – /proc/self/attr/current\””,
    “\”Could not retrieve selinux: Invalid argument – /proc/self/attr/current\””,
    “\”Wed Jul 18 23:00:49 -0400 2012 Puppet (debug): importing ‘/etc/puppet/agent/modules/hdp/manifests/init.pp’ in environment production\””,
    “\”Wed Jul 18 23:00:49 -0400 2012 Puppet (debug): importing ‘/etc/puppet/agent/modules/hdp/manifests/params.pp’ in environment production\””,
    “\”Wed Jul 18 23:00:49 -0400 2012 Puppet (debug): importing ‘/etc/puppet/agent/modules/hdp/manifests/namenode-conn.pp’ in environment production\””,

    Now what?

    Jim

    Collapse
    #7430

    I tried this and I can restart hmc. But the cluster is reported as being down after I reboot. Is there a step needed to restart the cluster?

    How do I fix that? I am running with VMware player rather than Virtualbox.

    Collapse
    #6638

    Dario Rexin
    Member

    I just found the solution. The problem is SELinux. It does not allow httpd to open some log files and that is why httpd does not start. On my test system in /etc/sysconfig/selinux I set SELINUX=disabled and executed setenforce 0 and it worked! For production systems I would however recommend to configure SELinux properly instead of disabling it ;-).

    Collapse
    #6637

    Dario Rexin
    Member

    It is a virtual machine (VirutalBox) an a notebook that is stationary ;-). I’ll try your suggestions and report back here.

    Thanks!

    Collapse
    #6636

    Steve Loughran
    Participant

    This looks to me like the machine doesn’t really know who it is -gets a new hostname on startup from the network, and next reboot round it’s getting a different name.

    1. Is this a laptop?
    2. what does hostname -f return?
    3. If you reboot, what does hostname -f say next time?

    If your machine moves around or its hostname is changed based on who the network tells it is, it’s going to have to be set up in one of two ways
    -give “localhost” as the hostname for the cluster
    -give your system a permanent hostname (I’m not going to go into the details there as I’ll probably forget something)

    Collapse
    #6634

    Dario Rexin
    Member

    I also have a single host setup. I don’t think that resolving the hostname is a problem, because it works fine the first time I start hmc and setting up the “cluster” also worked. Only after rebooting the virtual machine I wasn’t able to start hmc anymore. But anyway… my /etc/hosts looks like this:

    127.0.0.1 localhost.localdomain localhost
    ::1 localhost6.localdomain6 localhost6

    Collapse
    #6632

    Weiming Shi
    Member

    @Sasha, thank you for your reply.

    I installed hmc on a single node, so all i have is localhost.

    I succeed in installing hmc at the first time on my laptop.
    But afterwards, when i uninstalled hmc, and try to re-install it again.
    It always prompt out with various kinds of errors.
    I have recorded most of the failure logs.
    Is there any place i can submit them for helping the developers to debug it?

    Thanks

    Collapse
    #6631

    Sasha J
    Moderator

    @weining
    This comment does not help at all…
    More details please.
    In general,
    host have to be named and be resolvable pin both directions.
    localhost.localdomain does not make any sense.

    Thank you!
    Sasha

    Collapse
    #6626

    Weiming Shi
    Member

    I met this issue too …

    Collapse
    #6614

    Sasha J
    Moderator

    OK, I’m waiting for more information from you.

    Sasha

    Collapse
    #6609

    Dario Rexin
    Member

    Thanks for the quick reply. I did all these things as root. “hostname -f” is “localhost.localdomain”. I’ll post the contents of my /etc/hosts tomorrow, when I’m at work.

    Thanks!

    Dario

    Collapse
    #6606

    Sasha J
    Moderator

    When you start it first time, are there any error messages on the screen?
    What is you /etc/host content?
    what is “hostname -f” output?
    Are you trying to restart hmc as a root user?

    This error message usually points to non-root user trying to restart Apache server (which is part of HMC)…

    Thank you!
    Sasha

    Collapse
Viewing 26 replies - 1 through 26 (of 26 total)