cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
cta
HDP Developer: Apache Pig and Hive

cloud Upcoming Courses

Schedule

Overview

This 4 day training course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Pig and Hive. Topics include: Hadoop, YARN, HDFS, MapReduce, data ingestion, workflow definition, using Pig and Hive to perform data analytics on Big Data and an introduction to Spark Core and Spark SQL.

Duration

4 days

Format

50% Lecture/Discussion
50% Hands-on Labs

Prerequisites

Students should be familiar with programming principles and have experience in software development. SQL knowledge is also helpful. No prior Hadoop knowledge is required.

Target Audience

Software developers who need to understand and develop applications for Hadoop.

Course Schedule

Hortonworks University provides an immersive and valuable real world experience in scenario-based training Courses. Our classes are available both in classroom or online, from anywhere in the world.

COURCE OBJECTIVES

At the completion of the course students will be able to:

icon6.png

Describe Hadoop, YARN and use cases for Hadoop

icon6.png

Describe Hadoop ecosystem tools and frameworks

icon6.png

Describe the HDFS architecture

icon6.png

Use the Hadoop client to input data into HDFS

icon6.png

Transfer data between Hadoop and a relational database

icon6.png

Explain YARN and MapReduce architectures

icon6.png

Run a MapReduce job on YARN

icon6.png

Use Pig to explore and transform data in HDFS

icon6.png

Understand how Hive tables are defined and implemented

icon6.png

Use Hive to explore and analyze data sets

icon6.png

Create and populate a Hive table that uses ORC file formats

icon6.png

Explain and use the various Hive file formats

icon6.png

Use the new Hive windowing functions

icon6.png

Use Hive to run SQL-like queries to perform data analysis

icon6.png

Use Hive to join datasets using a variety of techniques

icon6.png

Write efficient Hive queries

icon6.png

Perform data analytics using the DataFu Pig library

icon6.png

Explain the uses and purpose of HCatalog

icon6.png

Use HCatalog with Pig and Hive

icon6.png

Define and schedule an Oozie workflow

icon6.png

Present the Spark ecosystem and high-level architecture

icon6.png

Perform data analysis with Spark's Resilient Distributed Dataset API

icon6.png

Explore Spark SQL and the DataFrame API

LAB CONTENT

icon6.png

Use HDFS commands to add/remove files and folders

icon6.png

Use Sqoop to transfer data between HDFS and a RDBMS

icon6.png

Run MapReduce and YARN application jobs

icon6.png

Explore, transform, split and join datasets using Pig

icon6.png

Use Pig to transform and export a dataset for use with Hive

icon6.png

Use HCatLoader and HCatStorer

icon6.png

Use Hive to discover useful information in a dataset

icon6.png

Describe how Hive queries get executed as MapReduce jobs

icon6.png

Perform a join of two datasets with Hive

icon6.png

Use advanced Hive features: windowing, views, ORC files

icon6.png

Use Hive analytics functions

icon6.png

Write a custom reducer in Python

icon6.png

Analyze clickstream data and compute quantiles with DataFu

icon6.png

Use Hive to compute ngrams on Avro-formatted files

icon6.png

Define an Oozie workflow

icon6.png

Use Spark Core to read files and perform data analysis

icon6.png

Create and join DataFrames with Spark SQL

Certification

The demand for Big Data skills is increasing every day. Hortonworks offers a comprehensive Certification program to help establish your credentials. Get trained, Get Certified, Get Hired!

Hortonworks University

Hortonworks University is your expert source for Apache Hadoop training and certification. Public and private on-site courses are available for IT professionals involved in implementing big data solutions.