Hortonworks Sandbox Forum

How to get started coding?

  • #50776
    Clement Fleury
    Participant

    Hi there!
    First of all, I must warn you : I am VERY new to Hadoop.

    I have installed the Hortonworks sandbox and did the first 6 tutorials (I am a Linux user, so no Excel for me…), so I think I understand quite well now Pig, Hive and HCatalog. I also have read a lot about HDFS, MapReduce and I really think I get all the concepts in Hadoop.

    I also installed successfully HDP2 with Ambari on a remote virtual machine (Proxmox).

    BUT : what now?

    I want to develop my own Java application that uses HDP2 cluster.

    I’m developing on my workstation (Eclipse on Ubuntu).

    How to get started ?
    What plugins / libraries do I have ton install ?
    How do I have to structure my code ?
    How do I get my local program to be executed on my HDP2 ?
    … ?

    I am so new to this that I don’t even know if I’m asking the right questions at the right people.
    Any tip, help, hint, step-by-step tutorial, link, … will be apreciated.

    Cheers!

to create new topics or reply. | New User Registration

  • Author
    Replies
  • #51126
    Sanjeev
    Moderator

    Hi Clement,

    Please find some answers as below:

    How to get started ?
    Once you have eclipse setup, you would start your Java application just like what you would do when developing normal Java application.

    What plugins / libraries do I have to install ?
    There are some plugins available in market to try out. To keep it simple you can just add the required libraries to your project.
    For instance, if you are using HDFS api’s. the jar’s are usually available @ /usr/lib/hadoop/lib. For Hive the location is /usr/lib/hive/lib and so on

    How do I have to structure my code ?
    Nothing specific here

    How do I get my local program to be executed on my HDP2 ?
    Normally, you will bundle your Java code into a jar and use the “hadoop jar <jar-name>” command to run it.
    Additionally, setting up a remote debugging will make it easy to debug your code. Please see http://pravinchavan.wordpress.com/2013/04/05/remote-debugging-of-hadoop-job-with-eclipse/

    Hope this helps.

    Thanks
    Sanjeev

    #51133
    Clement Fleury
    Participant

    Hi!

    Thanks @Sanjeev , very useful information indeed.

    If I may make a suggestion to the HortonWorks (and Hadoop as a general) community, as a real newbie : the HDP Sandbox tutorials are a really GREAT way to understand the main concepts of Hadoop (HDFS and MapReduce), as well as other tools (HCatalog, Hive, Pig, …).
    BUT, I think there is a real huge gap between these tutorials and the moment when you are actually developing actual code for Hadoop.

    I found only one usful (to me, as a rookie : very detailed and up to date) resource : https://github.com/hortonworks/hadoop-tutorials/blob/master/Community/T09_Write_And_Run_Your_Own_MapReduce_Java_Program_Poll_Result_Analysis.md

    If you have any other such pointers, please share!

    Cheers!

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.