Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
cta
HDP Analyst: Data Science Training

Overview

This course Provides instruction on the processes and practice of data science, including machine learning and natural language processing. Included are: tools and programming languages (Python, IPython, Mahout, Pig, NumPy, pandas, SciPy, Scikit-learn), the Natural Language Toolkit (NLTK), and Spark MLlib. Download the data sheet to view the full list of course objectives and labs. Download the data sheet to view the full list of course objectives and labs.

Prerequisites

Students must have experience with at least one programming or scripting language, knowledge in statistics and/or mathematics, and a basic understanding of big data and Hadoop principles. Students new to Hadoop are encouraged to attend the HDP Overview: Apache Hadoop Essentials course.


Target Audience


Architects, software developers, analysts and data scientists who need to apply data science and machine learning on Hadoop
.

1
Day

An Introduction to Data Science, Python, Hadoop and Machine Learning

Objectives

  • Define Data Science and Explain What a Data Scientist Does
  • Differentiate Between Different Types of Data Roles
  • List a Number of Data Science Use Cases
  • Present an Overview of Python
  • Describe the Components of the Big Data Scientific Stack

Labs

  • Using IPython
  • Data Analysis with Python
  • Using HDFS Commands
  • Introduction to Spark REPLs and Zeppelin
  • Using Apache Mahout for Machine Learning

Working with Spark RDDs, DataFrames and SparkSQL, Visualization in Zeppelin

Machine Learning Algorithms, Natural Language Processing, and Spark MLlib

Live Training

Live Training Self Paced Blended
LIVE CLASS
DATE & TIME
LOCATION
REGISTER
Pages:
1
2
3
4