The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

HDP on Linux – Installation Forum

Errors after VM is suspended/resumed

  • #7448

    I am probably guilty of mistreating my demo VM instance and expecting HDP to behave in an unstable situation.

    This may be related to my VM restart issues reported in another discussion.

    Now I am seeing errors after I suspend and resume a VM. I am not shutting down hmc or stopping any services prior to the suspension — should I be doing this?

    Anyway, after I suspend and resume the VM (running CentOS 5.8) and go the the hmc interface, I am seeing some Critical errors reported. Nagios is even sending me emails about them. For example here are some messages I just got:

    Notification Type: PROBLEM

    Service: HDFS::Corrupt/Missing blocks
    Host: localhost.localdomain
    Address: localhost.localdomain
    State: CRITICAL

    Date/Time: Thu Jul 19 12:23:21 EDT 2012

    Additional Info:

    CRITICAL: corrupt_blocks:0, missing_blocks:8, total_blocks:147


    Notification Type: PROBLEM

    Service: HBASE::Percent region servers down
    Host: localhost.localdomain
    Address: localhost.localdomain
    State: CRITICAL

    Date/Time: Thu Jul 19 12:24:41 EDT 2012

    Additional Info:

    CRITICAL: total:1, affected:1

    I guess my question is whether there is anything to be done to recover from these errors?

    Are they to be expected?

    Again, I may be guilty of assuming too much tolerance in HDP for VM stops/starts and suspends/resumes.


  • Author
  • #7451
    Sasha J

    Hi Jim

    did you install NTP? and is it configure to allow it to update the hw clock every time it syncs?



    Yes, NTP is on but the advanced option to Synchronize system clock before starting service was off. I turned that on just now. Will that help? Should I disable the Use Local Time Source option as well?

    Sasha J

    HI James,

    many components of hadoop reference the time, and in a VM there are several things that could happened when you suspend (but don’t shut down)

    1) if the VM is using its own virtual clock
    this should cause the least problem, if you suspend but don’t shut down

    2) if the vm is synched to the hw clock OR configured to update its hw clock every time it checks a time source, then you will immediately have problems related to timing, and timeouts

    3) that being said, while “technically” suspending a virtual machine should be invisible to the processes running on that instance, this is definitely not a tested scenario and not supported.

    if possible please do a proper shutdown of your vm’s processes and services.



    Seems like I may run into some reported errors in the Monitoring portion of the hmc console, but I can recover from these by stopping and starting the affected services and then after a some bit of time, these errors disappear.

    So (see other discussion thread) since I can’t reliably stop services, re-boot VM and start services, I might as well live with this incremental pain in my self-hosted mini Hortonworks demo environment.




    HI James,

    could you please send me your contact info so I can follow up with you regarding your VM issues?

    thanks in advance,

    Adam Brown


    I did send some contact info in an email to support.

    Bottom line is: things are manageable now IF I keep the VM UI in the foreground and expect to have to restart some services when the hmc console reports critical errors. The restarts take a reasonable amount of time in this case.

    If I should expect no errors upon resumption, then I would appreciate any additional pointers that Hortonworks support might have.


    An update: I just resumed the VM again and this time the console reports HBase down AND once again HDFS missing block errors. The VM was suspended yesterday and resumed just now.

    HDFS stopping and starting takes a bit of time as I recall — there are services on top of HDFS that have to stopped and started as well.

    So a resolution of this behavior would be appreciated if there is one.



    Hello James,

    HDP does not support suspend/resume in a Virtual Environment.

    If the cluster manages/syncs its own time, it should technically work – which would require you to disable any synching of the virtual clock with the host clock or any external clock.

    by default you will sync with either the host hw clock or an external source (on the internet)

    those will cause services like HBase to quickly fail, and others will fail less quickly, if you have suspended the VM while it is in the middle of any operations that have a monitor (timeout)

    again, suspend / resume is not a supported mode of operation for the HDP stack, if this is in your critical path, someone from POC-SUPPORT will follow up with you offline.



    Right, that’s what Sasha said so I don’t know why you asked for my contact info — to tell me again?

    I can turn off NTP to control external clock linking.I used vmware-tools to stop host time synchronization. So I will see it this is a more stable way to run the unsupported path I am on. This is counter to the installation advice for hmc — the statement is that NTP is required. But that may be for a true multi-node configuration. I am on a single node cluster realized in a single VM.

    This ability is NOT on my critical path for anything at the moment — just trying to get a sense of the possible and what glitches I encounter along the way.

    Sasha J

    Thanks for you continued interest in HDP, and your efforts to pioneer the limits of how HDP integrates in virtual environments. Please let us know if you have further critical VM related issues, as we are always available to work with you to achieve your goals

The topic ‘Errors after VM is suspended/resumed’ is closed to new replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.