The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

Hortonworks Sandbox Forum

How to get started coding?

  • #50776
    Clement Fleury

    Hi there!
    First of all, I must warn you : I am VERY new to Hadoop.

    I have installed the Hortonworks sandbox and did the first 6 tutorials (I am a Linux user, so no Excel for me…), so I think I understand quite well now Pig, Hive and HCatalog. I also have read a lot about HDFS, MapReduce and I really think I get all the concepts in Hadoop.

    I also installed successfully HDP2 with Ambari on a remote virtual machine (Proxmox).

    BUT : what now?

    I want to develop my own Java application that uses HDP2 cluster.

    I’m developing on my workstation (Eclipse on Ubuntu).

    How to get started ?
    What plugins / libraries do I have ton install ?
    How do I have to structure my code ?
    How do I get my local program to be executed on my HDP2 ?
    … ?

    I am so new to this that I don’t even know if I’m asking the right questions at the right people.
    Any tip, help, hint, step-by-step tutorial, link, … will be apreciated.


  • Author
  • #51126

    Hi Clement,

    Please find some answers as below:

    How to get started ?
    Once you have eclipse setup, you would start your Java application just like what you would do when developing normal Java application.

    What plugins / libraries do I have to install ?
    There are some plugins available in market to try out. To keep it simple you can just add the required libraries to your project.
    For instance, if you are using HDFS api’s. the jar’s are usually available @ /usr/lib/hadoop/lib. For Hive the location is /usr/lib/hive/lib and so on

    How do I have to structure my code ?
    Nothing specific here

    How do I get my local program to be executed on my HDP2 ?
    Normally, you will bundle your Java code into a jar and use the “hadoop jar <jar-name>” command to run it.
    Additionally, setting up a remote debugging will make it easy to debug your code. Please see

    Hope this helps.


    Clement Fleury


    Thanks @Sanjeev , very useful information indeed.

    If I may make a suggestion to the HortonWorks (and Hadoop as a general) community, as a real newbie : the HDP Sandbox tutorials are a really GREAT way to understand the main concepts of Hadoop (HDFS and MapReduce), as well as other tools (HCatalog, Hive, Pig, …).
    BUT, I think there is a real huge gap between these tutorials and the moment when you are actually developing actual code for Hadoop.

    I found only one usful (to me, as a rookie : very detailed and up to date) resource :

    If you have any other such pointers, please share!


The forum ‘Hortonworks Sandbox’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.