Home Forums Hortonworks Sandbox Using Talend Big Data Studio with the Sandbox

This topic contains 18 replies, has 7 voices, and was last updated by  Mur Raguthu 2 months, 1 week ago.

  • Creator
    Topic
  • #14908

    Today’s webinar showed a Hortonworks employee accessing the sandbox from Talend. Any instructions on how to do that on my own? Setup advice would be appreciated.

Viewing 3 replies - 16 through 18 (of 18 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #14968

    Thanks Carter for the pretty careful instructions. I have gone as far as downloading the Talend zip file and starting it up (on my Macbook Pro) and I found the demo zip. I have yet to try and make connections to the Sandbox VM. Is there a file that describes a “happy path” execution of a job defined in talend against HDP (in my case inside the Sandbox) so that files are created in HDFS and you can see results in the file browser, or perhaps back in the Talend UI. The webinar mentioned Apache logs as input. Does the demo include such a log?

    I appreciate the encouragement to think of the sandbox as a full HDP pseudo-cluster.

    I may try to install R Studio and Revolution Analytics rmr2 library in the sandbox — I have done this against a Cloudera demo VM but their Hue instance is missing the Pig and HCatalog panels so I think I would rather be working in an HDP environment.

    Collapse
    #14926

    James, I have done this and it does work, though it requires a bit of additional setup on your client machine. For best results I suggest using the Hortonworks Talend bundle because it ships with a bunch of built-in workflows.

    First download the Hortonworks branded Talend Open Studio for Big Data from hortonworks.com/download because it includes a number of sample workflows within the demos/HDPDEMOS.zip file. Import these into Talend by right-clicking under Job Designs and select the file after clicking “Select archive file”

    After importing you will have a bunch of jobs under Job Designs and an entry called HDP under Contexts. You should Edit the context group and view the values as a table. Next update the values for namenode_host, templeton_host, etc to be the IP address of your sandbox. Change the user to sandbox.

    Many of the jobs also require a variable called hdfs_test_dir which needs to be changed to something like /user/sandbox/test.

    Finally, many of the Hadoop actions within Talend require the client to be able to resolve the DNS name sandbox to the IP address of the sandbox. Not all actions require this but many do. To get everything to work you will need to edit your computer’s “hosts” file and add an entry for Sandbox.

    It would look something like:
    172.16.1.100 sandbox

    There are various resources online for editing hosts files if you’ve not done this before. You would substitute in your own sandbox IP in place of this one. This part is pretty ugly and we’re looking to eliminate the need to do it in the future.

    Hope that helps.

    Collapse
    #14912

    Sasha J
    Moderator

    James,
    In order to have Talend connected to HDFS on your Sandboix, you have to point it to the NameNode:
    use IP address you have provided by Sandbox on the screen and port 8020.

    Hope this help.

    Thank you!
    Sasha

    Collapse
Viewing 3 replies - 16 through 18 (of 18 total)

You are not currently logged in.






» Lost your Password?

Join Our Community

Stay up-to-date on the latest news, download software, watch training videos and more.

Join the Hortonworks Community

About HDP

Hortonworks Data Platform (HDP) is a 100% open source data management platform based on Apache Hadoop. It allows you to load, store, process and manage data in virtually any format and at any scale.

Learn More

Hadoop Training

Developing Solutions with Apache Hadoop Classes

Understanding Hadoop on Windows Classes

Applying Data Science using Apache Hadoop Classes

Developing Apache Hadoop Applications with Java Classes

View All Classes »