The annual Strata Data Conference will make its next stop in New York the week of September 25-29. The core theme of the conference this year is around driving business transformations through the power of data. And in the world of data, few topics excite as much as data science, machine learning, and deep learning. And with good reasons! The combination of the Hadoop big data world and the much older data science, machine learning world is a perfect marriage.
While Apache Hadoop has been around for a decade or more, it is really from 2011 onwards that it was packaged into a platform that adoption really started taking off.
Data at Rest: It brought us the concept of the Data Lake to manage the growing repositories of data at rest. But it remained a batch processing world until Yarn became the Data Operating System of the Hadoop platform. Yarn enabled multiple data engines or workloads to co-exist on the cluster – all accessing the same Data Lake and not a copy. Now we had batch, SQL and quickly others followed.
Data in Motion: The explosion in IoT devices and use cases drove us to require better ways to move the data from the edges to our Data Lake – with full security, lineage and provenance. Apache Nifi came to the fore as that data transportation and logistics layer. But it wasn’t just about the data movement – the emerging use cases for real-time analytics challenged the traditional concept of real-time which was about how fast can we move the data from inception to our place of analytics. Think CDC (change data capture) and other approaches. The real answer came when we started pushing the analytic down to the edge where the data got created! Stream processing!
Connected Data Platforms: The next frontier was to not just manage Data at Rest and Data in Motion, but we have to do so on premises, in the cloud, on another cloud and all combined. Now I can run multiple workloads, batch, Hive, Spark on all my data (at rest and in motion) and have the freedom to run it where I want.
In a recent blog on Data Science on HDP, Vinay Shukla and Huzefa Hakim speak of some of the benefits that some of the data science disciplines have when combined with big data. In my session on September 27th at 5:25pm (location 1E 17) I will talk about the next part of the journey to bring these two worlds of Hadoop and Data Science together at scale to deliver business outcomes in a connected world.
Come visit the Hortonworks team at our booth in the exhibition hall (BOOTH #601, M148, M149) and get a chance to see our Connected Data Platforms in action. You will also get a chance to see the Hortonworks Data Science solution that is powered by HDP and IBM Data Science Experience.