Big Data needs Data Science
Implementing modern data architecture with Hadoop means that it must deeply integrate with existing technologies, leverage existing skills and investments and provide key services. This guest post from David Smith, Vice President of Marketing and Community at Revolution Analytics, shares his perspective on the role of a Data Scientists in a Big Data world.
While many companies today have not figured out the process of efficiently collecting and storing data with Hadoop, the next step in that process — getting value out the data — isn’t yet in everyone’s grasp. But that’s where Data Science — the process of understanding, analyzing and creating actionable insights from data — comes in. You can use Data Science to measure regional market share, understand consumer sentiment, predict elections, and even make better wine. If you’re new to the concepts of Data Science, here are some resources to help you catch up:
- Learn the history of Data Science. While the name “Data Science” has only been a term of currency for since 2009, statisticians have been getting insight from data for centuries. (Today, most statisticians consider themselves data scientists as well.) This presentation by Carlos Somohano gives a great overview of the history of data science.
- Get started with public data. Getting insight from your own data is the goal, but if you’re just getting started you might want to started you might want to start with (and have easier access to) public data. Here are some free, public Big Data sets you can download today.
- Learn the R language. While R used to be thought of as a small-data tool, Revolution Analytics has extended R for big data, and today it is preferred tool of data scientists. This free e-book on data science with R is a great place to get started. And don’t forget about the 5000+ packages that extend R — here are 10 R packages every data scientist should know.
- Understand how to make Data Science actionable. Data Science isn’t the same thing as BI. In fact Data Science brings new focus to business intelligence: rather than providing general-purpose tools to business decision makers, consider creating targeted “data apps” that put the expertise of data scientists into the hands of end-users to solve specific problems.
If you’d like to learn more about the data scientist’s toolkit, you might want to view my 2012 talk “The Rise of Data Science in the Age of Big Data Analytics“.
And don’t forget to join the new webinar The Modern Data Architecture for Predictive Analytics with Revolution Analytics and Hortonworks Data Platform on Tuesday to learn how R and Hadoop form part of a data scientist’s architecture.
Get Started using Hadoop to Analyze Data. This guide includes tutorials, videos and advice on integrating Hadoop with popular analytics packages.