The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

Ambari Forum

HDP setup but trying to understand how to access it

  • #15161

    I used Ambari to created a simple 3 node HDP cluster and everything seems to work fine. Did a simple hdfs and mapred test using the respective accounts. I’m new to Hadoop and was wondering how to go about setting up a user account? (perhaps called HadoopTest1) with the appropriate rights to run jobs and work with hdfs . I’ve been spoiled by the HDP Sandbox where I had a graphical interface to upload files and work with the various components (grunt, hcatalog, etc… ) . I didn’t have to log in or anything. I think a user account was created called sandbox. I’m currently trying to understand how I can interface with my cluster. In standalone configurations, I simply open a terminal and start using hadoop utility (no login necessary) to upload files to hdfs , call mr jobs, etc..

    Just checking to see if someone could point to me to some links that can help me start using my HDP cluster once its setup (e.g creating necessary accounts, )

    Thanks !

  • Author
  • #15168
    Sasha J

    I believe you should start from some Hadoop tutorials.
    Take a look to the following links:
    And Hadoop book of course.

    Thank you!


    Thanks Sasha.
    The links you sent shows a basic getting started guide for hadoop and describes map/reduce , pig and many other things. It also shows how to configure hadoop in a virtualized environment. I don’t have problems running hadoop in a vm as everything is typically already done and setup for you (user accounts set, etc.. ) .

    I guess what I wanting is to figure out how to setup a gateway node from diagram @ bottom of this link :

    From the gatenode, I should be able to run mr jobs and interact with the file system without having to be on one of the datanodes.
    If you have a few moments, can you point me to something that shows how to install and properly configure a gateway machine to participate/connect with my HDP cluster ? If there’s a book, i wouldn’t mind checking that out also .

    As always thanks for your help.

    Sasha J

    I see…
    Take a look here:

    just setup repository on your gateway node, install needed RPMs (i believe hdfs and map reduce is what you need for now), the copy /etc/hadoop/conf folder from your cluster node to your gateway machine.
    Verify you can access HDFS (by hadoop fs command) and submit job (by hadoop jar command) from your gateway machine.

    This is it!
    Enjoy your working gateway!

    Thank you!


    I started going through the doc mentioned below, but figured Ambari might be able to handle all of this for me . I went back to Ambari machine and added a new host (specifying only Client and tools need)
    That did the trick. Thanks!

    Do you know of any web front ends that I could load on my client gateway machine that would allow Hive and Pig latin queries to be issued ? I like the front end that comes with the HDP Sandbox . Makes it really easy to upload files to HDFS and work with catalog,hive and pig scripts. Just curious. Thanks!

    Sasha J

    You can take a look on HUE project, this is front-end which uses in SandBox.
    Google for it, it is open source project and should be available from the GitHub.

    Thank you!

The forum ‘Ambari’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.