Administering Apache Hadoop
This three-day Apache Hadoop training course is designed for administrators who deploy and manage Apache Hadoop clusters. This course will walk you through installation, provisioning and ongoing resource management within a Hadoop cluster. You will leave with a valuable set of best practices that have been developed over years of working with Hadoop in production so that you can optimize your Hadoop clusters.
We’ll work through the full lifecycle of an Apache Hadoop deployment using realistic hands-on lab experiments. You’ll learn from our experience over the past six years of working with some of the largest Hadoop clusters in the world.
After successfully completing this training course each student will receive one free voucher for the Hortonworks Certified Apache Hadoop Administrator Exam.
The Hadoop Administration course spans three days and provides a solid foundation for management of your Hadoop clusters. A full outline is below.
In this course you will learn the best practices for Apache Hadoop administration as experienced by the developers and architects of core Apache Hadoop.
- How to size and deploy a cluster
- How to deploy a cluster for the first time
- How to perform ongoing maintenance to nodes in the cluster
- How to balance and performance tune a cluster
- How to integrate status and health checks into your existing monitoring tools (single plane of glass)
- How to recover form a NameNode or DataNode failure
- How to implement a high availability solution
- Best practices for deploying Hadoop clusters
This course is designed for IT administrators and operators with at least basic knowledge of Linux. Existing knowledge of Hadoop is not required.
Day 1: Deployment: Sizing, deployment and provisioning
- Introduction to Hadoop
- Best Practices for Hadoop Cluster Hardware and Software
- Basic Hadoop Operations
- Installing Hadoop using Ambari
- Benchmarking Hadoop
- Creating a multi-user environment in Hadoop
- Understanding logs and directory structures in Hadoop
Day 2: Management: Management, monitoring and high availability
- Understanding configuration files
- Monitoring the cluster with Nagios and Ganglia
- Understanding dfsadmin and mradmin
- Understanding Schedulers in Apache Hadoop
- Data Integrity with Apache Hadoop
- Using Rack Topology
Day 3: Maintenance
- Commissioning and Decommissioning nodes
- NameNode Back up and Recovery
- Hadoop Security
- Copying Cluster Data
- Hadoop Archive
- Upgrading Hadoop
- Hadoop Ecosystem
- Oozie Administration
- HCatalog and Hive Administration
- Students who complete a paid reservation at least two weeks prior to the start of the course will enjoy a 10% discount
- Note that discounts cannot be combined