cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
cta
HDP Developer: Java

cloud Upcoming Courses

Schedule

Overview

This advanced course provides Java programmers a deep-dive into Hadoop application development. Students will learn how to design and develop efficient and effective MapReduce applications for Hadoop using the Hortonworks Data Platform, including how to implement combiners, partitioners, secondary sorts, custom input and output formats, joining large datasets, unit testing, and developing UDFs for Pig and Hive. Labs are run on a 7-node HDP 2.1 cluster running in a virtual machine that students can keep for use after the training.

Duration

4 days

Format

50% Lecture/Discussion
50% Hands on Labs

Prerequisites

Students must have experience developing Java applications and using a Java IDE. Labs are completed using the Eclipse IDE and Gradle. No prior Hadoop knowledge is required.

Target Audience

Experienced Java software engineers who need to develop Java MapReduce applications for Hadoop.

Course Schedule

Hortonworks University provides an immersive and valuable real world experience in scenario-based training Courses. Our classes are available both in classroom or online, from anywhere in the world.

Course Objectives

At the completion of the course students will be able to:

icon6.png

Describe Hadoop 2.X and the Hadoop Distributed File System

icon6.png

Describe the YARN framework

icon6.png

Develop and run a Java MapReduce application on YARN

icon6.png

Use combiners and in-map aggregation

icon6.png

Write a custom partitioner to avoid data skew on reducers

icon6.png

Perform a secondary sort

icon6.png

Recognize use cases for built-in input and output formats

icon6.png

Write a custom MapReduce input and output format

icon6.png

Optimize a MapReduce job

icon6.png

Configure MapReduce to optimize mappers and reducers

icon6.png

Develop a custom RawComparator class

icon6.png

Distribute files as LocalResources

icon6.png

Describe and perform join techniques in Hadoop

icon6.png

Perform unit tests using the UnitMR API

icon6.png

Describe the basic architecture of HBase

icon6.png

Write an HBase MapReduce application

icon6.png

List use cases for Pig and Hive

icon6.png

Write a simple Pig script to explore and transform big data

icon6.png

Write a Pig UDF (User-Defined Function) in Java

icon6.png

Write a Hive UDF in Java

icon6.png

Use JobControl class to create a MapReduce workflow

icon6.png

Use Oozie to define and schedule workflows

Lab Content

Students will work through the following lab exercises using Eclipse, Maven, and the Hortonworks Data Platform2.X:

icon6.png

Configuring a Hadoop Development Environment

icon6.png

Putting data into HDFS using Java

icon6.png

Write a distributed grep MapReduce application

icon6.png

Write an inverted index MapReduce application

icon6.png

Configure and use a combiner

icon6.png

Writing custom combiners and partitioners

icon6.png

Globally sort output using the TotalOrderPartitioner

icon6.png

Writing a MapReduce job to sort data using a composite key

icon6.png

Writing a custom InputFormat class

icon6.png

Writing a custom OutputFormat class

icon6.png

Compute a simple moving average of stock price data

icon6.png

Use data compression

icon6.png

Define a RawComparator

icon6.png

Perform a map-side join

icon6.png

Using a Bloom filter

icon6.png

Unit testing a MapReduce job

icon6.png

Importing data into HBase

icon6.png

Writing an HBase MapReduce job

icon6.png

Writing User-Defined Pig and Hive functions

icon6.png

Defining an Oozie workflow

Certification

The demand for Big Data skills is increasing every day. Hortonworks offers a comprehensive Certification program to help establish your credentials. Get trained, Get Certified, Get Hired!

Hortonworks  University

Hortonworks University is your expert source for Apache Hadoop training and certification. Public and private on-site courses are available for IT professionals involved in implementing big data solutions.