Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
cta
HDP Analyst: Data Science

Overview

This course Provides instruction on the processes and practice of data science, including machine learning and natural language processing. Included are: tools and programming languages (Python, IPython, Mahout, Pig, NumPy, pandas, SciPy, Scikit-learn), the Natural Language Toolkit (NLTK), and Spark MLlib.

Prerequisites

Students must have experience with at least one programming or scripting language, knowledge in statistics and/or mathematics, and a basic understanding of big data and Hadoop principles. Students new to Hadoop are encouraged to attend the HDP Overview: Apache Hadoop Essentials course.


Target Audience


Architects, software developers, analysts and data scientists who need to apply data science and machine learning on Hadoop
.

1
Day

An Introduction to Hadoop and DataScience

Objectives

  • Using Hadoop for Data Science
  • The Hadoop Distributed File System
  • The MapReduce Framework
  • Hadoop 2 and YARN
  • Machine Learning from Data

Labs

  • Setting up the Lab Environment
  • Using HDFS Commands
  • Demonstration: Understanding MapReduce
  • Using Apache Mahout for Machine Learning

An Introduction to Apache Pig and Python

Machine Learning Algorithms

Live Training

LIVE CLASS
DATE & TIME
LOCATION
REGISTER