Ambari Forum

Ambari corrupts rpmdb

  • #48379
    PulsatingTaurus
    Participant

    Hii,
    I have setup HDP 2.0 cluster using 5 Virtual nodes (OS: CentOS 6.0 using VirtualBox). When I stop and restart ALL processes using Ambari, most of the times I get failure for few demons (randomly). I see following error(s) in logs. As a solution I have to manually delete “rm -f /var/lib/rpm/__db.00*” and restart the processes. After this fix, the processes start normally.
    What I have observed that every time Ambari tries to install the packages on the nodes (or atleast check if they are available) and somehow this corrupts the RPMDB.

    Please suggest if this behavior is due to environment settings or due to flaw in the cluster setup/configuration. If there is inherent problem with Ambari, is there any workaround/fix ?

    ERROR ==>

    err: /Stage[1]/Hdp::Snappy::Package/Hdp::Package[snappy]/Hdp::Package::Process_pkg[snappy]/Package[snappy]/ensure: change from absent to present failed: Execution of '/usr/bin/yum -d 0 -e 0 -y install snappy' returned 1: rpmdb: Thread/process 1757/139873492805376 failed: Thread died in Berkeley DB library
    error: db3 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery
    error: cannot open Packages index using db3 - (-30974)
    error: cannot open Packages database in /var/lib/rpm
    CRITICAL:yum.main:

    Error: rpmdb open failed

to create new topics or reply. | New User Registration

  • Author
    Replies
  • #48443
    Kenny Zhang
    Moderator

    Hi Pulsating,

    Can you try to run the below command then try to see if it fix the problem?

    1- rm /var/lib/rpm/__db*
    2- yum –rebuilddb

    Please let me know.

    Thanks,
    Kenny

    #48505
    PulsatingTaurus
    Participant

    Thanks Kenny.
    The question is not about how to fix the problem. I have earlier used the same steps to get rid of this problem. However I see this problem re-occurring over and again for the cluster.
    So the question is, is there any bug in Ambari that results in RPMDB corruption? Please note that I have no parallel install or yum update operation going on when the cluster is being started.

    #48773
    Kenny Zhang
    Moderator

    This problem doesn’t seem to be related to Ambari or any particular program. It looks like a YUM problem or some OS environmental issue.
    Just wondering if you can reproduce this problem with a script with a loop to install then remove a single application (eg. httpd) for 10 or 100 times.

    Kenny

    #55743
    Dan Dietterich
    Participant

    I am seeing a similar problem pushing a configuration into Ambari through the REST API. I expect this is not an Ambari problem, but Ambari is tickling a data corruption problem in yum/rpm. Still, I think Ambari should provide a programmable method of recovering from such a failure

    Is there such a method? Or do I really have to touch every node in the cluster manually and rebuild their rpm databases and then use the Ambari dashboard to manually complete the installations and start the services??

    My environment is RedHat nodes in AWS EC2. Not enough characters in this box to provide the blueprint and host mapping json…

    #58140

    Hi Dan and PulsatingTauru, are you able to reply with how much RAM was allocated to your agent, and the memory block size?
    I found some related issues on the Red Hat Bugzilla page that may point to the RAM being less than 1GB, and/or the memory block size being less than 4KB.

    https://bugzilla.redhat.com/show_bug.cgi?id=680508
    https://bugzilla.redhat.com/show_bug.cgi?id=923201
    https://bugzilla.redhat.com/show_bug.cgi?id=1033013

    #58155
    Dan Dietterich
    Participant

    My machine has 4K pages and 32GB memory. I do not know available memory at the point of corruption. I solved the corruption problem by putting a wrapper on top of yum that uses a file lock to serialize all installations. I conclude that the problem is a concurrency bug in yum or layers below it. My yum ruby script is:

    #!/usr/bin/ruby
    # build the command line
    COMMAND = “yum_real ”
    cmd = COMMAND
    while (ARGV[0])
    cmd = cmd + ” ” + ARGV.shift
    end

    # Do the command in a locked region
    File.open(“yum.rb.lock”, File::RDWR) {|f|
    f.flock(File::LOCK_EX)
    system(cmd)
    }

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.