Optimal VM for laptop demo

to create new topics or reply. | New User Registration

This topic contains 8 replies, has 3 voices, and was last updated by  James Solderitsch 2 years, 8 months ago.

  • Creator
  • #7008

    I have a core i7 laptop (Dell) with 8 GBs of memory running VMware player (the latest version) under Windows 7 Enterprise. My test Hortonworks 1.0 VM is CentOS 5.8 running via VMware player on this laptop. I have this VM configured as 2 cores with 2 GB of RAM.

    With Sasha’s patient help, I got HDP 1.0 running there.

    But, even though it looks like I have enough free RAM and WindowsTask Manager reports that my CPU is not stressed, standard HDP administration tasks take VERY long. I decided to enable webhdfs for example and the process to reconfigure the cluster and get it back to operational status was over 1 hour!!

    Ari in his recent webinar (running on a Mac I recall with VMware Fusion) was getting very quick admin task responses. I did notice that the single VM he used was listed as 1 core with only 1 GB of memory.

    Will I get better response if I use a “smaller” VM?

    What is the minimum VM to get a self-hosted cluster deployed to?

    Any performance tips in general? I realize this is not how you really use hadoop, but for learning, I want something lean and mean.

Viewing 8 replies - 1 through 8 (of 8 total)

You must be to reply to this topic. | Create Account

  • Author
  • #7071

    Thanks for the reply. I don’t recall “dryrun” mode being mentioned, although I do remember the disclaimer that 20 to 40 minutes was normal in real life. I was seeing times longer than that but I suspect it is partly due to slow networking here and VMware running poorly on my memory constrained laptop.

    Sasha explained that subsequent service stops and starts are also slow in order to be “safe”. I was expecting slow installs but less-slow service stops and starts and I stand corrected here too.


    Ari Zilka

    That webinar had HMC in “dryrun” mode meaning all actions were scripted and not actually running. What is taking so long is your tasks are network-bound and/or puppet is just polling for a few minutes waiting for completion. SInce it is polling it can end up sleeping longer than needed (no event from puppet agent telling us the tasks are done). We intend to speed things up and shorten the polling windows in near term releases but net-net it needs to run this slow to be safe that cluster ops are succeeding and you should not stop your cluster on a regular basis.


    Great, thanks for the confirmation that this sort of slowness is expected behavior. I appreciate that in an operational situation, you would not be stopping and starting services.

    I think the recent Hortonworks webinar from Ari and his extraordinary startup speed shown there led me to have the wrong expectation in this regard.


    Sasha J

    in normal situation, you do not need to stop/start cluster at all. Once started it running forever (at least until failure occurs)…
    Start/stop sequence is slow because of the extensive testing during the process to make sure all components are stopped/started correctly.
    This is expected behavior and you should not focus on this stop/start timing.
    Increasing memory definitely gives you faster cluster during the normal operations.

    Thank you!


    I did a fresh CentOS 64 bit instance on a different machine — my Macbook Air with Flash disk.

    Everything installed nicely.

    Is it normal that when from Manage Services, you stop them all and it takes many minutes for the stop to complete? I would have expected that simply stopping them would not take a long time.

    Same thing for starting them up. Each start and test process seems to take a longer time than I would have expected. I still have a constrained VM, so maybe allocating more RAM to the VM would help this start up and stop experience?

    Just trying to adjust my performance expectations.


    Sasha J

    As of WebHDFS:
    it is not presented as a separate service.
    When it enabled and HDFS running, you should be able to communicate with HDFS through REST API.
    Like this:
    When I point my browser to http://mycluster:50070/webhdfs/v1/tmp?op=LISTSTATUS
    I see the following:


    Take a look to the docs: http://hadoop.apache.org/common/docs/r1.0.3/webhdfs.html

    Thank you!


    Sasha J

    Hi James,

    Re: VM

    For some reference, we have a single node that can be spun up in EC2 and that is a large or extra large instance which is 8+ GB of ram.

    You will want to allocated (at a minimum) 4GB to your VM, remember, out of 4GB hadoop will end up with only a portion.

    2 GB is definitely not enough



    Also, after waiting all that time, it looks like webhdfs never was actually added to my running HDFS capability. The web UI implies you can make this change after setting up the cluster initially.

    I even re-started hmc.

Viewing 8 replies - 1 through 8 (of 8 total)
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.