Home Forums Hortonworks Sandbox How to get started coding?

Tagged: , ,

This topic contains 2 replies, has 2 voices, and was last updated by  Clement Fleury 5 months, 2 weeks ago.

  • Creator
    Topic
  • #50776

    Clement Fleury
    Participant

    Hi there!
    First of all, I must warn you : I am VERY new to Hadoop.

    I have installed the Hortonworks sandbox and did the first 6 tutorials (I am a Linux user, so no Excel for me…), so I think I understand quite well now Pig, Hive and HCatalog. I also have read a lot about HDFS, MapReduce and I really think I get all the concepts in Hadoop.

    I also installed successfully HDP2 with Ambari on a remote virtual machine (Proxmox).

    BUT : what now?

    I want to develop my own Java application that uses HDP2 cluster.

    I’m developing on my workstation (Eclipse on Ubuntu).

    How to get started ?
    What plugins / libraries do I have ton install ?
    How do I have to structure my code ?
    How do I get my local program to be executed on my HDP2 ?
    … ?

    I am so new to this that I don’t even know if I’m asking the right questions at the right people.
    Any tip, help, hint, step-by-step tutorial, link, … will be apreciated.

    Cheers!

Viewing 2 replies - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #51133

    Clement Fleury
    Participant

    Hi!

    Thanks @Sanjeev , very useful information indeed.

    If I may make a suggestion to the HortonWorks (and Hadoop as a general) community, as a real newbie : the HDP Sandbox tutorials are a really GREAT way to understand the main concepts of Hadoop (HDFS and MapReduce), as well as other tools (HCatalog, Hive, Pig, …).
    BUT, I think there is a real huge gap between these tutorials and the moment when you are actually developing actual code for Hadoop.

    I found only one usful (to me, as a rookie : very detailed and up to date) resource : https://github.com/hortonworks/hadoop-tutorials/blob/master/Community/T09_Write_And_Run_Your_Own_MapReduce_Java_Program_Poll_Result_Analysis.md

    If you have any other such pointers, please share!

    Cheers!

    Collapse
    #51126

    Sanjeev
    Participant

    Hi Clement,

    Please find some answers as below:

    How to get started ?
    Once you have eclipse setup, you would start your Java application just like what you would do when developing normal Java application.

    What plugins / libraries do I have to install ?
    There are some plugins available in market to try out. To keep it simple you can just add the required libraries to your project.
    For instance, if you are using HDFS api’s. the jar’s are usually available @ /usr/lib/hadoop/lib. For Hive the location is /usr/lib/hive/lib and so on

    How do I have to structure my code ?
    Nothing specific here

    How do I get my local program to be executed on my HDP2 ?
    Normally, you will bundle your Java code into a jar and use the “hadoop jar <jar-name>” command to run it.
    Additionally, setting up a remote debugging will make it easy to debug your code. Please see http://pravinchavan.wordpress.com/2013/04/05/remote-debugging-of-hadoop-job-with-eclipse/

    Hope this helps.

    Thanks
    Sanjeev

    Collapse
Viewing 2 replies - 1 through 2 (of 2 total)