In baseball, Hadoop could provide insights traditional stats can’t

The "Moneyball" phenomenon in baseball is one of the most commonly cited and high-profile instances of an industry using data-driven insights to improve performance, but the ways professional baseball teams handle statistics are changing as teams attempt to account for more intangible variables. Experts have predicted that teams may soon begin using Hadoop clusters to make use of the unstructured data that accompanies traditional stats.

In a recent interview with CNBC, Paul DePodesta, vice president of player development and scouting for the New York Mets, explained that baseball's data-driven focus is only increasing and becoming more complex. Along with the growing interest in baseball data has come an increase in the amount of information that must be processed to actually make use of it.

Spotting substantive trends that accurately forecast player behavior is a major challenge, particularly as teams try to separate skill from luck in their analysis. According to TechCrunch contributor and venture capitalist Barry Eggers, that challenge has led at least one major league team to consider building a small Hadoop cluster. A future in which teams have locker room data scientists running in-game queries in HBase may not be far off.

"So why would a baseball organization need a Hadoop cluster?" Eggers wrote. "Because unstructured data may unlock insights that are not apparent from the structured event data that is available to every team."

Categorized by :

Leave a Reply

Your email address will not be published. Required fields are marked *

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.