Home Forums Hortonworks Sandbox Using Talend Big Data Studio with the Sandbox

This topic contains 18 replies, has 7 voices, and was last updated by  Mur Raguthu 1 year, 5 months ago.

  • Creator
    Topic
  • #14908

    Today’s webinar showed a Hortonworks employee accessing the sandbox from Talend. Any instructions on how to do that on my own? Setup advice would be appreciated.

Viewing 18 replies - 1 through 18 (of 18 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #17646

    Mur Raguthu
    Member

    Robert, I was able to download HDP & Talend successfully via that link and able to configure successfully. Now I am able to talk to HDFS and start doing our POCs. Great stuff and thanks for your help. We have lot of ideas and looking forward to realize.

    Collapse
    #17472

    Robert
    Participant

    Hi Mur,
    The link you provided accesses Hortonworks HDP 1.1 for windows which is in beta. I believe you need to go here:

    http://hortonworks.com/thankyou-hdp12-talend/?mdl=13577&ao=2&lnk=1

    then it will automatically download the tar file, when extracted should have the HDPDEMOS.zip file that Kai is referring to.

    Hope that helps.
    Regards,
    Robert

    Collapse
    #17471

    Mur Raguthu
    Member

    Thanks Kai. I try to download kit but link was broken.

    http://public-repo-1.development.hortonworks.com/HDP-Win/1.1/Beta/hdp-1.1.0-160.winpkg.msi/

    Please guide me.

    Collapse
    #17430

    Kai Waehner
    Member

    @Ted: Uh, right. I had to reinstall (and converted it from VMware to Parallels). This time, I did not configure the host stuff. That should be the reason! Will try to configure it…

    @Mur: HDPDEMOS.zip is only included if you download Talend from Hortonwork’s website! You can also just download it, then use the demos in your own Talend installation (that is what I did because I already had Talend installed).

    Kai

    Collapse
    #17415

    Mur Raguthu
    Member

    Hi Team,
    I am new to group and I am trying to install Talend on Sandbox. I was able to install Talend (64bit Win 7) successfully but struggling to integrate. BTW I didn’t see HDPDEMOS.zip in the Talend software or how I get that? Please help me. I really appreciate your help.

    Collapse
    #17321

    tedr
    Member

    Hi Kai,

    Now that I know that the ip that the sandbox comes up with is different for two different locations, it makes me think that you may have missed the part of setting up the sandbox where it is set to use host only networking. Did you set the sandbox to use the host only networking?

    Thanks,
    Ted.

    Collapse
    #17301

    Kai Waehner
    Member

    Hi Ted,

    thanks for your help…

    Today, I wanted to try your proposal from a Café. Now, I found out, that my hostname / IP is NOT “speedport…”, but a typical IP address (192.168.1.49). This is also shown in settings of my Mac.

    Now, the Pig jobs are running perfectly!

    So it is definitely just a configuration problem at home. Probably, I have to find out why I get this strange “speedport…” hostname from my router! I will also try out your proposal at home to map my IP to the “speedport…” hostname in the hosts file.

    Collapse
    #17238

    tedr
    Member

    Hi Kai,

    Not exactly, I mean getting the ip of your Mac and then mapping that to the “speedport…” in your /etc/hosts.

    Thanks,
    Ted.

    Collapse
    #17221

    Kai Waehner
    Member

    Hi Ted,

    yes, 10.37.129.3 is the IP which I (have to) use in my browser for the sandbox. Works perfectly.

    You mean I should try a mapping from “speedport…” to 10.37.129.3 in hosts file?

    Kai

    Collapse
    #17149

    tedr
    Member

    Hi Kai,

    A couple of things here: the ip that you put in the /et/hosts file for the sand box, is that the ip that the sandbox tells you to connect your browser to? And you could try setting the hostname of your mac or adding the “speedport…” name to your hosts file.

    Thanks,
    Ted.

    Collapse
    #16981

    Kai Waehner
    Member

    Sure. “hostname -f” shows exactly this strange name: speedport_w723_v_typ_a_1_00_096. This is also the name of my DSL / WLAN router. I do not know why my Mac has this strange hostname.

    Here is my hosts file (I just added the last line for Hortonworks sandbox):

    cat /private/etc/hosts
    ##
    # Host Database
    #
    # localhost is used to configure the loopback interface
    # when the system is booting. Do not change this entry.
    ##
    127.0.0.1 localhost
    255.255.255.255 broadcasthost
    ::1 localhost
    fe80::1%lo0 localhost
    10.37.129.3 sandbox

    Collapse
    #16963

    tedr
    Member

    Hi Kai,

    Can you post the contents of the hosts file here, also the output of hostname -f. Trying to see where it
    is coming up with that strange hostname.
    Thanks,
    Ted.

    Collapse
    #16939

    Kai Waehner
    Member

    Hi guys,

    I installed Sandbox and Talend. I did what Carter described. Now, my sandbox is running, and HDP’s Talend examples for HDFS work perfectly. However, for example, PIG examples do not work.

    Before changing the hosts file, I got this error when executing a PIG example:

    13/03/08 17:05:57 ERROR security.UserGroupInformation: PriviledgedActionException as:kwaehner cause:java.net.UnknownHostException: unknown host: sandbox
    13/03/08 17:05:58 INFO mapReduceLayer.MapReduceLauncher: 0% complete
    13/03/08 17:05:58 INFO mapReduceLayer.MapReduceLauncher: job null has failed! Stop running all dependent jobs
    13/03/08 17:05:58 INFO mapReduceLayer.MapReduceLauncher: 100% complete
    13/03/08 17:05:58 WARN mapReduceLayer.Launcher: There is no log file to write to.
    13/03/08 17:05:58 ERROR mapReduceLayer.Launcher: Backend error message during job submission
    java.net.UnknownHostException: unknown host: sandbox

    After I added the IP address of my sandbox + “sandbox” to the hosts file, I get an UnknownHostException, again. This time, the stacktrace uses another name: speedport_w723_v_typ_a_1_00_096 (that is the same name which I see in my terminal). What is this name, and more important, how can I solve the problem to run the Talend PIG job using the Hortonworks Sandbox?

    13/03/08 17:10:28 INFO mapReduceLayer.MapReduceLauncher: 0% complete
    13/03/08 17:10:36 INFO mapred.JobClient: Cleaning up the staging area hdfs://sandbox:8020/user/kwaehner/.staging/job_201303080406_0009
    13/03/08 17:10:36 ERROR security.UserGroupInformation: PriviledgedActionException as:kwaehner cause:java.net.UnknownHostException: speedport_w723_v_typ_a_1_00_096: speedport_w723_v_typ_a_1_00_096: nodename nor servname provided, or not known
    13/03/08 17:10:36 INFO mapReduceLayer.MapReduceLauncher: job null has failed! Stop running all dependent jobs
    13/03/08 17:10:36 INFO mapReduceLayer.MapReduceLauncher: 100% complete
    13/03/08 17:10:36 WARN mapReduceLayer.Launcher: There is no log file to write to.
    13/03/08 17:10:36 ERROR mapReduceLayer.Launcher: Backend error message during job submission
    java.net.UnknownHostException: speedport_w723_v_typ_a_1_00_096: speedport_w723_v_typ_a_1_00_096: nodename nor servname provided, or not known
    at java.net.InetAddress.getLocalHost(InetAddress.java:1438)

    Collapse
    #15131

    In another thread, I documented my success in making R Studio server and the Revolution Analytics rmr2 library accessible in the sandbox. It all went pretty smoothly except needing to know the sandbox user’s password. Sasha suggested that I could change this password myself when logged in as root and nothing bad should happen. After doing that, things seem to be working as expected for me.

    Collapse
    #14981

    Carter Shanklin
    Participant

    James the designs under HDFS and HCATALOG will create files you can see in the file browser. Be sure to change the paths to /user/sandbox/something. The HCATALOG designs will also create tables you can query in Hive or Pig.

    If you do try Revo or R in the Sandbox we’d be very interested to hear how it goes.

    Collapse
    #14968

    Thanks Carter for the pretty careful instructions. I have gone as far as downloading the Talend zip file and starting it up (on my Macbook Pro) and I found the demo zip. I have yet to try and make connections to the Sandbox VM. Is there a file that describes a “happy path” execution of a job defined in talend against HDP (in my case inside the Sandbox) so that files are created in HDFS and you can see results in the file browser, or perhaps back in the Talend UI. The webinar mentioned Apache logs as input. Does the demo include such a log?

    I appreciate the encouragement to think of the sandbox as a full HDP pseudo-cluster.

    I may try to install R Studio and Revolution Analytics rmr2 library in the sandbox — I have done this against a Cloudera demo VM but their Hue instance is missing the Pig and HCatalog panels so I think I would rather be working in an HDP environment.

    Collapse
    #14926

    Carter Shanklin
    Participant

    James, I have done this and it does work, though it requires a bit of additional setup on your client machine. For best results I suggest using the Hortonworks Talend bundle because it ships with a bunch of built-in workflows.

    First download the Hortonworks branded Talend Open Studio for Big Data from hortonworks.com/download because it includes a number of sample workflows within the demos/HDPDEMOS.zip file. Import these into Talend by right-clicking under Job Designs and select the file after clicking “Select archive file”

    After importing you will have a bunch of jobs under Job Designs and an entry called HDP under Contexts. You should Edit the context group and view the values as a table. Next update the values for namenode_host, templeton_host, etc to be the IP address of your sandbox. Change the user to sandbox.

    Many of the jobs also require a variable called hdfs_test_dir which needs to be changed to something like /user/sandbox/test.

    Finally, many of the Hadoop actions within Talend require the client to be able to resolve the DNS name sandbox to the IP address of the sandbox. Not all actions require this but many do. To get everything to work you will need to edit your computer’s “hosts” file and add an entry for Sandbox.

    It would look something like:
    172.16.1.100 sandbox

    There are various resources online for editing hosts files if you’ve not done this before. You would substitute in your own sandbox IP in place of this one. This part is pretty ugly and we’re looking to eliminate the need to do it in the future.

    Hope that helps.

    Collapse
    #14912

    Sasha J
    Moderator

    James,
    In order to have Talend connected to HDFS on your Sandboix, you have to point it to the NameNode:
    use IP address you have provided by Sandbox on the screen and port 8020.

    Hope this help.

    Thank you!
    Sasha

    Collapse
Viewing 18 replies - 1 through 18 (of 18 total)