Home Forums Ambari Ambari corrupts rpmdb

Tagged: ,

This topic contains 6 replies, has 4 voices, and was last updated by  Dan Dietterich 1 month, 2 weeks ago.

  • Creator
    Topic
  • #48379

    PulsatingTaurus
    Participant

    Hii,
    I have setup HDP 2.0 cluster using 5 Virtual nodes (OS: CentOS 6.0 using VirtualBox). When I stop and restart ALL processes using Ambari, most of the times I get failure for few demons (randomly). I see following error(s) in logs. As a solution I have to manually delete “rm -f /var/lib/rpm/__db.00*” and restart the processes. After this fix, the processes start normally.
    What I have observed that every time Ambari tries to install the packages on the nodes (or atleast check if they are available) and somehow this corrupts the RPMDB.

    Please suggest if this behavior is due to environment settings or due to flaw in the cluster setup/configuration. If there is inherent problem with Ambari, is there any workaround/fix ?

    ERROR ==>

    err: /Stage[1]/Hdp::Snappy::Package/Hdp::Package[snappy]/Hdp::Package::Process_pkg[snappy]/Package[snappy]/ensure: change from absent to present failed: Execution of '/usr/bin/yum -d 0 -e 0 -y install snappy' returned 1: rpmdb: Thread/process 1757/139873492805376 failed: Thread died in Berkeley DB library
    error: db3 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery
    error: cannot open Packages index using db3 - (-30974)
    error: cannot open Packages database in /var/lib/rpm
    CRITICAL:yum.main:

    Error: rpmdb open failed

Viewing 6 replies - 1 through 6 (of 6 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #58155

    Dan Dietterich
    Participant

    My machine has 4K pages and 32GB memory. I do not know available memory at the point of corruption. I solved the corruption problem by putting a wrapper on top of yum that uses a file lock to serialize all installations. I conclude that the problem is a concurrency bug in yum or layers below it. My yum ruby script is:

    #!/usr/bin/ruby
    # build the command line
    COMMAND = “yum_real ”
    cmd = COMMAND
    while (ARGV[0])
    cmd = cmd + ” ” + ARGV.shift
    end

    # Do the command in a locked region
    File.open(“yum.rb.lock”, File::RDWR) {|f|
    f.flock(File::LOCK_EX)
    system(cmd)
    }

    Collapse
    #58140

    Alejandro Fernandez
    Participant

    Hi Dan and PulsatingTauru, are you able to reply with how much RAM was allocated to your agent, and the memory block size?
    I found some related issues on the Red Hat Bugzilla page that may point to the RAM being less than 1GB, and/or the memory block size being less than 4KB.

    https://bugzilla.redhat.com/show_bug.cgi?id=680508

    https://bugzilla.redhat.com/show_bug.cgi?id=923201

    https://bugzilla.redhat.com/show_bug.cgi?id=1033013

    Collapse
    #55743

    Dan Dietterich
    Participant

    I am seeing a similar problem pushing a configuration into Ambari through the REST API. I expect this is not an Ambari problem, but Ambari is tickling a data corruption problem in yum/rpm. Still, I think Ambari should provide a programmable method of recovering from such a failure

    Is there such a method? Or do I really have to touch every node in the cluster manually and rebuild their rpm databases and then use the Ambari dashboard to manually complete the installations and start the services??

    My environment is RedHat nodes in AWS EC2. Not enough characters in this box to provide the blueprint and host mapping json…

    Collapse
    #48773

    Kenny Zhang
    Moderator

    This problem doesn’t seem to be related to Ambari or any particular program. It looks like a YUM problem or some OS environmental issue.
    Just wondering if you can reproduce this problem with a script with a loop to install then remove a single application (eg. httpd) for 10 or 100 times.

    Kenny

    Collapse
    #48505

    PulsatingTaurus
    Participant

    Thanks Kenny.
    The question is not about how to fix the problem. I have earlier used the same steps to get rid of this problem. However I see this problem re-occurring over and again for the cluster.
    So the question is, is there any bug in Ambari that results in RPMDB corruption? Please note that I have no parallel install or yum update operation going on when the cluster is being started.

    Collapse
    #48443

    Kenny Zhang
    Moderator

    Hi Pulsating,

    Can you try to run the below command then try to see if it fix the problem?

    1- rm /var/lib/rpm/__db*
    2- yum –rebuilddb

    Please let me know.

    Thanks,
    Kenny

    Collapse
Viewing 6 replies - 1 through 6 (of 6 total)