Applying Data Science using Apache Hadoop

Applying Data Science using Hadoop covers Data Science principles and techniques through lecture and hands-on experience. During this two-day class, students will experience a hands-on learning environment to experience the processes and practice of data analysis with Hadoop and the R statistical language with the outcome of implementing a recommender solution with R and Mahout.


2 days


Students must have basic computer skills, basic knowledge in statistics and a basic understanding of programming or scripting. Prior experience with Hadoop, Mahout, or R, although helpful, is not required.

Target Audience

Architects, software developers, analysts and data scientists who need to understand how to apply data science to large datasets with Hadoop.

Course Objectives

  • Understand the basics of Data Science
  • Understand the basics of machine learning
  • Learn about Hadoop and its relation to Data Science
  • Learn the basics of the R statistics language from Revolution Analytics
  • Understand recommender systems
  • Implement a recommender system with R statistics language
  • Implement a recommender system with Hadoop (using Mahout)

Lab Content

Students will work through the following exercises using the R statistical language the Hortonworks Data Platform:

  • Hands on setup of solution environment
  • Defining the problem
  • Fundamentals of R
  • Data analysis using R
  • Creating the user/item matrix
  • Using recommenderlab with R
  • Running Mahout with Hadoop
  • Mahout ALS & Evaluation
  • Data product design diagram


  • $2295
  • Students who complete a paid reservation at least two weeks prior to the start of the course will enjoy a 10% discount
  • Note that discounts cannot be combined

Please contact us for any questions on Apache Hadoop training courses or would like to discuss a custom, on-site training course.


Thank you for subscribing!