In this age, all systems need to be up all the time and run efficiently. It is not that simple and most of us need help and advice from experts who have done this before. The great news is that there are a set of experts ready to help and share their experiences at Hadoop Summit San Jose. To help you pick the best sessions, Track committee chair Cindy Gross – Big Data AzureCAT, Microsoft, recommends the following 3 sessions:
Speakers: Sheetal Dolas and Chris Nauroth from Hortonworks
Hadoop has become a backbone of many enterprises. While it can do wonders for businesses, it sometimes can be overwhelming for its operators and users. Amateurs as well as seasoned operators of Hadoop are caught unaware by common pitfalls of deploying, tuning and operating a Hadoop cluster. Having spent 5+ years working with 100s of Hadoop users, running clusters with 1000s of nodes, managing 10s of petabytes of data and running 100s of 1000s of tasks per day, we have seen people’s unintentional acts, suboptimal configurations and common mistakes have resulted into downtimes, SLA violations, many hours of recovery operations and in some cases even data loss! Most of these traumas could have been easily avoided by applying easy to follow best practices that would protect data and optimize performance. In this talk we present real life stories, common pitfalls and most importantly, strategies on how to correctly deploy and manage Hadoop clusters. The talk will empower users and help make their Hadoop journey more fulfilling and rewarding. We will also discuss SmartSense. SmartSense can identify latent problems in a cluster and provide recommendations so that an operator can fix them before they manifest as a service degradation or outage. undefined undefined undefined
Speakers: Vivek Madani and Karthik Karuppaiya from Symantec
When you are building a large scale Big Data Analytics Cloud Platform and you have 1000s of engineers wanting to come onboard to develop applications on it, how do you manage? In this talk we will show you how we enabled Developers at Symantec to create, share and destroy clusters at will for their development purpose. We will demonstrate: 1. How Cloudbreak is used for our next generation Self Service Analytics Clusters built on our Openstack cloud and AWS. 2. Contributions to Cloudbreak for keystone V3 and Native Openstack API support. 3. How we manage all of our services through Ambari, including our contributions to Ambari Custom Services. 4. How we built data pipelines using Kafka and Storm to anonymize and replicate partial data sets into these clusters for developers to play with.
Speakers: Eric Krenz and Dylan Bernhardt from Target
This project creates a continuous integration suite for testing a Hadoop cluster’s infrastructure as code (IAC). This is done by taking existing standard kitchen tests further. We use Jenkins to create a virtual cluster and run production cookbooks on it. Next, using Ambari blueprints, we install Hadoop along with all of our production services. Following this is, an extensive suite of tests on the cluster before tearing it down. Along with creating this platform integration suite, the project provides an on-demand isolated development environment that mocks our production cluster. We provide access to, and manage these environments using an API and front-end application. Challenges we have overcome include: operating system level security and virtual disk emulation with OpenStack Cinder, retaining flexibility in our cookbooks between virtual and bare-metal, identifying bottlenecks in the cluster, providing real-time production data on demand to users, and implementing a process to consistently secure the cluster with Kerberos from scratch. Future plans include adding the capability to change the amount of allocated resources in real-time using OpenStack Heat orchestration, and introducing an easy-to-use, web driven way to share and persist data between virtual clusters and production.
In the Cloud and Operations track covers the core practices and patterns for planning, deploying, and managing Hadoop cluster from on-premise to cloud. It also covers the best practices for loading, moving, and managing data workflows. Sessions will range from how to get started, operating your cluster to cutting edge best practices for large-scale deployments.
Hope to see you at the sessions, but you need to register to attend Hadoop Summit San Jose.