Home Forums HDP on Linux – Installation Errors after VM is suspended/resumed

This topic contains 10 replies, has 3 voices, and was last updated by  Sasha J 2 years, 3 months ago.

  • Creator
    Topic
  • #7448

    I am probably guilty of mistreating my demo VM instance and expecting HDP to behave in an unstable situation.

    This may be related to my VM restart issues reported in another discussion.

    Now I am seeing errors after I suspend and resume a VM. I am not shutting down hmc or stopping any services prior to the suspension — should I be doing this?

    Anyway, after I suspend and resume the VM (running CentOS 5.8) and go the the hmc interface, I am seeing some Critical errors reported. Nagios is even sending me emails about them. For example here are some messages I just got:

    Notification Type: PROBLEM

    Service: HDFS::Corrupt/Missing blocks
    Host: localhost.localdomain
    Address: localhost.localdomain
    State: CRITICAL

    Date/Time: Thu Jul 19 12:23:21 EDT 2012

    Additional Info:

    CRITICAL: corrupt_blocks:0, missing_blocks:8, total_blocks:147

    and

    Notification Type: PROBLEM

    Service: HBASE::Percent region servers down
    Host: localhost.localdomain
    Address: localhost.localdomain
    State: CRITICAL

    Date/Time: Thu Jul 19 12:24:41 EDT 2012

    Additional Info:

    CRITICAL: total:1, affected:1

    I guess my question is whether there is anything to be done to recover from these errors?

    Are they to be expected?

    Again, I may be guilty of assuming too much tolerance in HDP for VM stops/starts and suspends/resumes.

    Jim

Viewing 10 replies - 1 through 10 (of 10 total)

The topic ‘Errors after VM is suspended/resumed’ is closed to new replies.

  • Author
    Replies
  • #7656

    Sasha J
    Moderator

    Thanks for you continued interest in HDP, and your efforts to pioneer the limits of how HDP integrates in virtual environments. Please let us know if you have further critical VM related issues, as we are always available to work with you to achieve your goals

    Collapse
    #7631

    Right, that’s what Sasha said so I don’t know why you asked for my contact info — to tell me again?

    I can turn off NTP to control external clock linking.I used vmware-tools to stop host time synchronization. So I will see it this is a more stable way to run the unsupported path I am on. This is counter to the installation advice for hmc — the statement is that NTP is required. But that may be for a true multi-node configuration. I am on a single node cluster realized in a single VM.

    This ability is NOT on my critical path for anything at the moment — just trying to get a sense of the possible and what glitches I encounter along the way.

    Collapse
    #7626

    runeetv
    Member

    Hello James,

    HDP does not support suspend/resume in a Virtual Environment.

    If the cluster manages/syncs its own time, it should technically work – which would require you to disable any synching of the virtual clock with the host clock or any external clock.

    by default you will sync with either the host hw clock or an external source (on the internet)

    those will cause services like HBase to quickly fail, and others will fail less quickly, if you have suspended the VM while it is in the middle of any operations that have a monitor (timeout)

    again, suspend / resume is not a supported mode of operation for the HDP stack, if this is in your critical path, someone from POC-SUPPORT will follow up with you offline.

    Adam

    Collapse
    #7625

    An update: I just resumed the VM again and this time the console reports HBase down AND once again HDFS missing block errors. The VM was suspended yesterday and resumed just now.

    HDFS stopping and starting takes a bit of time as I recall — there are services on top of HDFS that have to stopped and started as well.

    So a resolution of this behavior would be appreciated if there is one.

    Jim

    Collapse
    #7624

    I did send some contact info in an email to support.

    Bottom line is: things are manageable now IF I keep the VM UI in the foreground and expect to have to restart some services when the hmc console reports critical errors. The restarts take a reasonable amount of time in this case.

    If I should expect no errors upon resumption, then I would appreciate any additional pointers that Hortonworks support might have.

    Collapse
    #7618

    runeetv
    Member

    @James

    HI James,

    could you please send me your contact info so I can follow up with you regarding your VM issues?

    thanks in advance,

    Adam Brown

    Collapse
    #7502

    Seems like I may run into some reported errors in the Monitoring portion of the hmc console, but I can recover from these by stopping and starting the affected services and then after a some bit of time, these errors disappear.

    So (see other discussion thread) since I can’t reliably stop services, re-boot VM and start services, I might as well live with this incremental pain in my self-hosted mini Hortonworks demo environment.

    Jim

    Collapse
    #7477

    Sasha J
    Moderator

    HI James,

    many components of hadoop reference the time, and in a VM there are several things that could happened when you suspend (but don’t shut down)

    1) if the VM is using its own virtual clock
    this should cause the least problem, if you suspend but don’t shut down

    2) if the vm is synched to the hw clock OR configured to update its hw clock every time it checks a time source, then you will immediately have problems related to timing, and timeouts

    3) that being said, while “technically” suspending a virtual machine should be invisible to the processes running on that instance, this is definitely not a tested scenario and not supported.

    if possible please do a proper shutdown of your vm’s processes and services.

    Sasha

    Collapse
    #7457

    Yes, NTP is on but the advanced option to Synchronize system clock before starting service was off. I turned that on just now. Will that help? Should I disable the Use Local Time Source option as well?

    Collapse
    #7451

    Sasha J
    Moderator

    Hi Jim

    did you install NTP? and is it configure to allow it to update the hw clock every time it syncs?

    sasha

    Collapse
Viewing 10 replies - 1 through 10 (of 10 total)