HOW TO: Connect/Write a File to Hortonworks Sandbox from Talend Studio

Writing a file to Hortonworks Sandbox from Talend Studio

I recently needed to quickly build some test data for my Hadoop environment and was looking for a tool to help me out. What I discovered was this is a very simple process within Talend Studio. (you can get the latest Talend Studio from their site)

Here is how…

Step 1 – Generating Test Data within Talend Studio

  • Create a New Job within the Job Designer
  • Drag a tRowGenerator onto the Designer
  • Double Click on your tRowGenerator component and add in fields you want to generate

Step 2 – Connecting to HDFS from Talend

  • Drag a tHDFSConnection onto the Designer
  • Change the “Name Node URI” property to point to your Hortonworks Sandbox on port 8020.
  • Change the connection your to “sandbox”.
  • Right click on the tHDFSConnection and add a OK trigger that connects the tHDFSConnection to the tRowGenerator

Step 3 – Writing to HDFS

  • Drag a tHDFSOutput onto the Designer
  • Change the “Name Node URI” property to point to your Hortonworks Sandbox on port 8020. Example:”hdfs://<YOUR SANDBOX IP>:8020/”
  • Change the connection your to “sandbox”.
  • Set the name of the output file in File Name field
  • Right click on the tRowGenerator and add a row main that connects the tRowGenerator to the tHDFSOutput

Step 4 – Running the Job from Talend

  •  Click on the “Run” Tab and press the “Run” button

Step 5 – Viewing the file in the Hortonworks Sandbox

  • Open your web browser and enter the URL: http://<YOUR SANDBOX IP>:8000
  • Click of the File Browser Icon on the top bar
  • Your file should have appeared within the sandbox user’s home directory

VOILA! 

You can explore Hadoop with many more tutorials in the Hortonworks Sandbox.

Thank you for subscribing!