The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

Hortonworks Sandbox Forum

Using Talend Big Data Studio with the Sandbox

  • #14908

    Today’s webinar showed a Hortonworks employee accessing the sandbox from Talend. Any instructions on how to do that on my own? Setup advice would be appreciated.

  • Author
  • #14912
    Sasha J

    In order to have Talend connected to HDFS on your Sandboix, you have to point it to the NameNode:
    use IP address you have provided by Sandbox on the screen and port 8020.

    Hope this help.

    Thank you!

    Carter Shanklin

    James, I have done this and it does work, though it requires a bit of additional setup on your client machine. For best results I suggest using the Hortonworks Talend bundle because it ships with a bunch of built-in workflows.

    First download the Hortonworks branded Talend Open Studio for Big Data from because it includes a number of sample workflows within the demos/ file. Import these into Talend by right-clicking under Job Designs and select the file after clicking “Select archive file”

    After importing you will have a bunch of jobs under Job Designs and an entry called HDP under Contexts. You should Edit the context group and view the values as a table. Next update the values for namenode_host, templeton_host, etc to be the IP address of your sandbox. Change the user to sandbox.

    Many of the jobs also require a variable called hdfs_test_dir which needs to be changed to something like /user/sandbox/test.

    Finally, many of the Hadoop actions within Talend require the client to be able to resolve the DNS name sandbox to the IP address of the sandbox. Not all actions require this but many do. To get everything to work you will need to edit your computer’s “hosts” file and add an entry for Sandbox.

    It would look something like: sandbox

    There are various resources online for editing hosts files if you’ve not done this before. You would substitute in your own sandbox IP in place of this one. This part is pretty ugly and we’re looking to eliminate the need to do it in the future.

    Hope that helps.


    Thanks Carter for the pretty careful instructions. I have gone as far as downloading the Talend zip file and starting it up (on my Macbook Pro) and I found the demo zip. I have yet to try and make connections to the Sandbox VM. Is there a file that describes a “happy path” execution of a job defined in talend against HDP (in my case inside the Sandbox) so that files are created in HDFS and you can see results in the file browser, or perhaps back in the Talend UI. The webinar mentioned Apache logs as input. Does the demo include such a log?

    I appreciate the encouragement to think of the sandbox as a full HDP pseudo-cluster.

    I may try to install R Studio and Revolution Analytics rmr2 library in the sandbox — I have done this against a Cloudera demo VM but their Hue instance is missing the Pig and HCatalog panels so I think I would rather be working in an HDP environment.

    Carter Shanklin

    James the designs under HDFS and HCATALOG will create files you can see in the file browser. Be sure to change the paths to /user/sandbox/something. The HCATALOG designs will also create tables you can query in Hive or Pig.

    If you do try Revo or R in the Sandbox we’d be very interested to hear how it goes.


    In another thread, I documented my success in making R Studio server and the Revolution Analytics rmr2 library accessible in the sandbox. It all went pretty smoothly except needing to know the sandbox user’s password. Sasha suggested that I could change this password myself when logged in as root and nothing bad should happen. After doing that, things seem to be working as expected for me.

    Kai Waehner

    Hi guys,

    I installed Sandbox and Talend. I did what Carter described. Now, my sandbox is running, and HDP’s Talend examples for HDFS work perfectly. However, for example, PIG examples do not work.

    Before changing the hosts file, I got this error when executing a PIG example:

    13/03/08 17:05:57 ERROR security.UserGroupInformation: PriviledgedActionException as:kwaehner unknown host: sandbox
    13/03/08 17:05:58 INFO mapReduceLayer.MapReduceLauncher: 0% complete
    13/03/08 17:05:58 INFO mapReduceLayer.MapReduceLauncher: job null has failed! Stop running all dependent jobs
    13/03/08 17:05:58 INFO mapReduceLayer.MapReduceLauncher: 100% complete
    13/03/08 17:05:58 WARN mapReduceLayer.Launcher: There is no log file to write to.
    13/03/08 17:05:58 ERROR mapReduceLayer.Launcher: Backend error message during job submission unknown host: sandbox

    After I added the IP address of my sandbox + “sandbox” to the hosts file, I get an UnknownHostException, again. This time, the stacktrace uses another name: speedport_w723_v_typ_a_1_00_096 (that is the same name which I see in my terminal). What is this name, and more important, how can I solve the problem to run the Talend PIG job using the Hortonworks Sandbox?

    13/03/08 17:10:28 INFO mapReduceLayer.MapReduceLauncher: 0% complete
    13/03/08 17:10:36 INFO mapred.JobClient: Cleaning up the staging area hdfs://sandbox:8020/user/kwaehner/.staging/job_201303080406_0009
    13/03/08 17:10:36 ERROR security.UserGroupInformation: PriviledgedActionException as:kwaehner speedport_w723_v_typ_a_1_00_096: speedport_w723_v_typ_a_1_00_096: nodename nor servname provided, or not known
    13/03/08 17:10:36 INFO mapReduceLayer.MapReduceLauncher: job null has failed! Stop running all dependent jobs
    13/03/08 17:10:36 INFO mapReduceLayer.MapReduceLauncher: 100% complete
    13/03/08 17:10:36 WARN mapReduceLayer.Launcher: There is no log file to write to.
    13/03/08 17:10:36 ERROR mapReduceLayer.Launcher: Backend error message during job submission speedport_w723_v_typ_a_1_00_096: speedport_w723_v_typ_a_1_00_096: nodename nor servname provided, or not known


    Hi Kai,

    Can you post the contents of the hosts file here, also the output of hostname -f. Trying to see where it
    is coming up with that strange hostname.

    Kai Waehner

    Sure. “hostname -f” shows exactly this strange name: speedport_w723_v_typ_a_1_00_096. This is also the name of my DSL / WLAN router. I do not know why my Mac has this strange hostname.

    Here is my hosts file (I just added the last line for Hortonworks sandbox):

    cat /private/etc/hosts
    # Host Database
    # localhost is used to configure the loopback interface
    # when the system is booting. Do not change this entry.
    ## localhost broadcasthost
    ::1 localhost
    fe80::1%lo0 localhost sandbox


    Hi Kai,

    A couple of things here: the ip that you put in the /et/hosts file for the sand box, is that the ip that the sandbox tells you to connect your browser to? And you could try setting the hostname of your mac or adding the “speedport…” name to your hosts file.


    Kai Waehner

    Hi Ted,

    yes, is the IP which I (have to) use in my browser for the sandbox. Works perfectly.

    You mean I should try a mapping from “speedport…” to in hosts file?



    Hi Kai,

    Not exactly, I mean getting the ip of your Mac and then mapping that to the “speedport…” in your /etc/hosts.


    Kai Waehner

    Hi Ted,

    thanks for your help…

    Today, I wanted to try your proposal from a Café. Now, I found out, that my hostname / IP is NOT “speedport…”, but a typical IP address ( This is also shown in settings of my Mac.

    Now, the Pig jobs are running perfectly!

    So it is definitely just a configuration problem at home. Probably, I have to find out why I get this strange “speedport…” hostname from my router! I will also try out your proposal at home to map my IP to the “speedport…” hostname in the hosts file.


    Hi Kai,

    Now that I know that the ip that the sandbox comes up with is different for two different locations, it makes me think that you may have missed the part of setting up the sandbox where it is set to use host only networking. Did you set the sandbox to use the host only networking?


    Mur Raguthu

    Hi Team,
    I am new to group and I am trying to install Talend on Sandbox. I was able to install Talend (64bit Win 7) successfully but struggling to integrate. BTW I didn’t see in the Talend software or how I get that? Please help me. I really appreciate your help.

    Kai Waehner

    @Ted: Uh, right. I had to reinstall (and converted it from VMware to Parallels). This time, I did not configure the host stuff. That should be the reason! Will try to configure it…

    @Mur: is only included if you download Talend from Hortonwork’s website! You can also just download it, then use the demos in your own Talend installation (that is what I did because I already had Talend installed).


    Mur Raguthu

    Thanks Kai. I try to download kit but link was broken.

    Please guide me.


    Hi Mur,
    The link you provided accesses Hortonworks HDP 1.1 for windows which is in beta. I believe you need to go here:

    then it will automatically download the tar file, when extracted should have the file that Kai is referring to.

    Hope that helps.

    Mur Raguthu

    Robert, I was able to download HDP & Talend successfully via that link and able to configure successfully. Now I am able to talk to HDFS and start doing our POCs. Great stuff and thanks for your help. We have lot of ideas and looking forward to realize.

    christ west

    Do I need to configure the settings within the system settings or in the DCHP servers that allows all connections. will it automatically configure?

    Chris West of

The forum ‘Hortonworks Sandbox’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.