Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics, offering information and knowledge of the Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
cta
HDP Developer: Java

cloud Upcoming Courses

Schedule

Overview

This course provides Java programmers a deep-dive into Hadoop application development. Students will learn how to design and develop efficient and effective MapReduce applications for Hadoop using the Hortonworks Data Platform, including how to implement combiners, partitioners, secondary sorts, custom input and output formats, joining large datasets, unit testing, and developing UDFs for Pig and Hive. Labs are run on a 7-node HDP 2.1 cluster running in a virtual machine that students can keep for use after the training.

Duration

4 days

Format

50% Lecture/Discussion
50% Hands on Labs

Prerequisites

Students must have experience developing Java applications and using a Java IDE. Labs are completed using the Eclipse IDE and Gradle. No prior Hadoop knowledge is required.

Target Audience

Experienced Java software engineers who need to develop Java MapReduce applications for Hadoop.

Course Schedule

Hortonworks University provides an immersive and valuable real world experience in scenario-based training Courses. Our classes are available both in classroom or online, from anywhere in the world.

Course Objectives

At the completion of the course students will be able to:

*

Describe Hadoop 2.X and the Hadoop Distributed File System

*

Describe the YARN framework

*

Develop and run a Java MapReduce application on YARN

*

Use combiners and in-map aggregation

*

Write a custom partitioner to avoid data skew on reducers

*

Perform a secondary sort

*

Recognize use cases for built-in input and output formats

*

Write a custom MapReduce input and output format

*

Optimize a MapReduce job

*

Configure MapReduce to optimize mappers and reducers

*

Develop a custom RawComparator class

*

Distribute files as LocalResources

*

Describe and perform join techniques in Hadoop

*

Perform unit tests using the UnitMR API

*

Describe the basic architecture of HBase

*

Write an HBase MapReduce application

*

List use cases for Pig and Hive

*

Write a simple Pig script to explore and transform big data

*

Write a Pig UDF (User-Defined Function) in Java

*

Write a Hive UDF in Java

*

Use JobControl class to create a MapReduce workflow

*

Use Oozie to define and schedule workflows

Lab Content

Students will work through the following lab exercises using Eclipse, Maven, and the Hortonworks Data Platform2.X:

*

Configuring a Hadoop Development Environment

*

Putting data into HDFS using Java

*

Write a distributed grep MapReduce application

*

Write an inverted index MapReduce application

*

Configure and use a combiner

*

Writing custom combiners and partitioners

*

Globally sort output using the TotalOrderPartitioner

*

Writing a MapReduce job to sort data using a composite key

*

Writing a custom InputFormat class

*

Writing a custom OutputFormat class

*

Compute a simple moving average of stock price data

*

Use data compression

*

Define a RawComparator

*

Perform a map-side join

*

Using a Bloom filter

*

Unit testing a MapReduce job

*

Importing data into HBase

*

Writing an HBase MapReduce job

*

Writing User-Defined Pig and Hive functions

*

Defining an Oozie workflow

Certification

The demand for Big Data skills is increasing every day. Hortonworks offers a comprehensive Certification program to help establish your credentials. Get trained, Get Certified, Get Hired!

Hortonworks  University

Hortonworks University is your expert source for Apache Hadoop training and certification. Public and private on-site courses are available for IT professionals involved in implementing big data solutions.