Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
April 25, 2017
prev slideNext slide

Five Missteps to Avoid on your First Big Data Journey

You have heard about Big Data for a long time, and how companies that use Big Data as part of their business decision making process experience significantly higher profitability than their competition.

Now that your company is ready to embark on its first Apache Hadoop® journey there are important lessons to be learned. Read on and learn how to avoid the pitfalls and missteps so many companies fall into.

Pitfall 1: We Are Going To Start Small

It is natural for companies in general, and IT organizations in particular, to start their Big Data journeys under conditions where they can manage the risk by determining the viability of the technology. However, we have learned that the more data you have, the higher the likelihood of finding new and exciting insights.

In case after case, the size of the initial cluster is a good predictor of the success of the first Hadoop project. In other words, businesses that start out with cluster sizes of ten nodes or less generally do not have sufficient data in their Hadoop environment to uncover significant insights.

Best Practice: Start out with a cluster of at least 30 nodes. Outline business objectives and then bring in as much data as your infrastructure can comfortably store and process to meet them.

Pitfall 2: Build It And They Shall Come

Another common mistake that companies make is to build their Hadoop cluster without having a clear objective that is connected to deriving real business value. It is true that a number of companies start out with the objective to reduce the operational cost of their existing data infrastructure by moving that data into Hadoop. However, the cost benefits of such projects are largely limited to IT organizations.

To make a positive impact on your company’s revenues, profitability or competitive leverage through Big Data then you must partner with business to come up with concrete use cases that will drive such results. These use cases must outline the key business metrics and identify the data sources and processing steps required to achieve the desired business results.

Best Practice: Start out with a use case built around achieving concrete business results. Even if building a prototype keep an eye on rolling it out to production. Succeed or fail quickly and communicate success to the broader organization.

Pitfall 3: We Need To Hire A Team Of People With Hadoop Background

Many companies at the start of their Hadoop journeys hire an architect to simply install and configure their Hadoop cluster. A Hadoop architect is an expensive resource whose expertise are better utilized down the road when security architecture, governance procedures and IT processes need to be operationalized.

Hadoop is a unique technology that cuts across infrastructure, applications, and business transformation. It is ideal to have a Hadoop-centric practice which is part of the broader analytics organization, however finding personnel with background in Hadoop infrastructure and its various components is a tall order. Hadoop requires a unique set of skills that few companies have in place at the onset of their journey.

The first step in your Hadoop journey is to invest in a resident architect (RA) with deep subject matter expertise and practical experience from your Hadoop platform provider. An RA should work onsite with your team to address the topics and tasks your organization needs assistance with. The second step to building a Hadoop practice is to leverage existing skills in your IT organization through training and RA guidance.

Data architects, business analysts, developers and administrators that are part of every IT organization can be trained on various components of the Hadoop platform along with guidance from RA to acquire the skills required to manage your Hadoop infrastructure.

Best Practice: Get the services of an on-site resident architect at the onset of your Hadoop journey. Leverage existing skills in your IT team and invest in training to build a Hadoop practice in your organization.

Pitfall 4: We Are Not Big Enough To Have A Center Of Excellence

As you start your journey, Hadoop is going to be a technology that your organization will have little experience with. Successful companies that navigate this journey tend to establish a cross-business Center of Excellence (COE). The COE enables companies to share thought leadership, best practices, research and development, process and governance, support and training for all lines of business across the enterprise. The COE also helps build an enterprise-level data and analytics capabilities roadmap that is invaluable for planning purposes.

It is a misperception that only Fortune 500 companies have the resources to invest in COEs. The fact is a Big Data COE is simply an organization that is typically staffed by an Executive Sponsor, Project Manager, Enterprise Architect, Technical Lead, Business Lead, Business-to-Technical Liaison. The COE members meet on a regular basis to sets standards and best practices, deploy new tools, and establish strategies for data governance and security.

Best Practice: Build a Center of Excellence to establish consistent processes, standards and foster communication between business and IT.

Pitfall 5: It’s Open Source — We Can Do It Ourselves

It’s a common misconception for companies to underestimate the importance of support and training as they embark on their first Hadoop journey. Many think that Hadoop is an open source technology with freely available code that their developers can learn on their own. Various Hadoop distributions provide a platform that businesses can leverage to build applications that help them get value from their data.

Support and training:

  • Enables businesses to focus on building the applications that will provide them with competitive advantage rather than investing in plumbing or the platform.
  • Provides on-going application maintenance – while most of the developers are very excited about working on cutting edge technology to build new applications, very few are interested in maintaining these applications.
  • Protects your company form mission-critical issues that may impact your applications with the security of 24/7 support.
  • Optimizes your current and upcoming initiatives with best practices.
  • Provides companies with access to self-paced learning and the ability to gain knowledge by interacting with experienced support engineers and committers.

Best Practice: Focus on building value-added business applications by using a pre-built Hadoop platform. Procure a support subscription from a leading Hadoop distribution to protect yourself from downtime, issues and optimize your Hadoop initiatives with best practices. Invest in training and professional services.

Read the full 8-page white paper. Download the complete Five Missteps to Avoid on Your First Big Data Journey today

Leave a Reply

Your email address will not be published. Required fields are marked *