The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

HDP on Linux – Installation Forum

Optimal VM for laptop demo

  • #7008

    I have a core i7 laptop (Dell) with 8 GBs of memory running VMware player (the latest version) under Windows 7 Enterprise. My test Hortonworks 1.0 VM is CentOS 5.8 running via VMware player on this laptop. I have this VM configured as 2 cores with 2 GB of RAM.

    With Sasha’s patient help, I got HDP 1.0 running there.

    But, even though it looks like I have enough free RAM and WindowsTask Manager reports that my CPU is not stressed, standard HDP administration tasks take VERY long. I decided to enable webhdfs for example and the process to reconfigure the cluster and get it back to operational status was over 1 hour!!

    Ari in his recent webinar (running on a Mac I recall with VMware Fusion) was getting very quick admin task responses. I did notice that the single VM he used was listed as 1 core with only 1 GB of memory.

    Will I get better response if I use a “smaller” VM?

    What is the minimum VM to get a self-hosted cluster deployed to?

    Any performance tips in general? I realize this is not how you really use hadoop, but for learning, I want something lean and mean.

  • Author
  • #7009

    Also, after waiting all that time, it looks like webhdfs never was actually added to my running HDFS capability. The web UI implies you can make this change after setting up the cluster initially.

    I even re-started hmc.

    Sasha J

    Hi James,

    Re: VM

    For some reference, we have a single node that can be spun up in EC2 and that is a large or extra large instance which is 8+ GB of ram.

    You will want to allocated (at a minimum) 4GB to your VM, remember, out of 4GB hadoop will end up with only a portion.

    2 GB is definitely not enough


    Sasha J

    As of WebHDFS:
    it is not presented as a separate service.
    When it enabled and HDFS running, you should be able to communicate with HDFS through REST API.
    Like this:
    When I point my browser to http://mycluster:50070/webhdfs/v1/tmp?op=LISTSTATUS
    I see the following:


    Take a look to the docs:

    Thank you!


    I did a fresh CentOS 64 bit instance on a different machine — my Macbook Air with Flash disk.

    Everything installed nicely.

    Is it normal that when from Manage Services, you stop them all and it takes many minutes for the stop to complete? I would have expected that simply stopping them would not take a long time.

    Same thing for starting them up. Each start and test process seems to take a longer time than I would have expected. I still have a constrained VM, so maybe allocating more RAM to the VM would help this start up and stop experience?

    Just trying to adjust my performance expectations.

    Sasha J

    in normal situation, you do not need to stop/start cluster at all. Once started it running forever (at least until failure occurs)…
    Start/stop sequence is slow because of the extensive testing during the process to make sure all components are stopped/started correctly.
    This is expected behavior and you should not focus on this stop/start timing.
    Increasing memory definitely gives you faster cluster during the normal operations.

    Thank you!


    Great, thanks for the confirmation that this sort of slowness is expected behavior. I appreciate that in an operational situation, you would not be stopping and starting services.

    I think the recent Hortonworks webinar from Ari and his extraordinary startup speed shown there led me to have the wrong expectation in this regard.

    Ari Zilka

    That webinar had HMC in “dryrun” mode meaning all actions were scripted and not actually running. What is taking so long is your tasks are network-bound and/or puppet is just polling for a few minutes waiting for completion. SInce it is polling it can end up sleeping longer than needed (no event from puppet agent telling us the tasks are done). We intend to speed things up and shorten the polling windows in near term releases but net-net it needs to run this slow to be safe that cluster ops are succeeding and you should not stop your cluster on a regular basis.


    Thanks for the reply. I don’t recall “dryrun” mode being mentioned, although I do remember the disclaimer that 20 to 40 minutes was normal in real life. I was seeing times longer than that but I suspect it is partly due to slow networking here and VMware running poorly on my memory constrained laptop.

    Sasha explained that subsequent service stops and starts are also slow in order to be “safe”. I was expecting slow installs but less-slow service stops and starts and I stand corrected here too.

The forum ‘HDP on Linux – Installation’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.