This is a guest blog post from Gary Nakamura, CEO at our partner Concurrent, Inc. discussing Cascading Pattern and the new Hadoop tutorial they have written for the Hortonworks Sandbox. This is one of the first tutorials aimed at more experienced crowd. Enjoy!
Cascading Pattern signifies an important milestone for Cascading as we continue our mission of driving innovation and to simplify Big Data application development. We have leveraged the broad platform support of the Cascading application framework to offer a free, open source, standard-based scoring engine enabling analysts and data scientists to quickly deploy machine-learning applications on Apache Hadoop.
As many of us are all too aware, Hadoop is rapidly becoming the data store of choice for tackling enterprise Big Data analytics needs. The need for Hadoop to easily integrate with existing data management, analytics systems and leverage existing enterprise skills has increased exponentially over the past couple of years.
Cascading Pattern clears a critical path for Big Data applications by enabling data scientists to quickly bring their work to production data on Hadoop. When combined with the full Cascading offering, Pattern closes the modeling, development and production loop for all data oriented applications. The combination of ANSI SQL, Java, and PMML through one single application framework, Cascading, is a simple and yet powerful ensemble, further enabling enterprises to drive differentiation through data.
Pattern is a machine-learning project within the Cascading development platform, which is used for building enterprise data workflows. Cascading provides an abstraction layer on top of Hadoop and other computing topologies that allows Enterprises to leverage existing skills and resources to build data processing applications on Apache Hadoop without specialized Hadoop skills. Cascading Pattern, in particular, leverages an industry standard called Predictive Model Markup Language (PMML), which allows Data Scientists to leverage their favorite analytics tools such as SAS, R, Microstrategy, Oracle, etc., to export predictive models and very quickly run them on data sets stored in Hadoop. Benefits include greatly reduced development costs, time and less licensing issues at scale – all while leveraging Hadoop clusters, the core competencies of data analytics and data science staff, and existing intellectual property in the predictive models.
Cascading Pattern addresses a common use case that is particularly important for enterprise IT. “How do I quickly and easily deploy and test my predictive models on Hadoop?”Another common usage of Cascading Pattern is for customer experiments, such as with A/B testing, Multi-Armed Bandit, etc. The idea is that there may be multiple ways to construct a predictive model for a given problem, each with different trade-offs. This is a powerful proposition for Enterprises driving to leverage their data and Hadoop in new and interesting ways.
Try Cascading Pattern Tutorial with the Hortonworks Sandbox. This tutorial is a step-by-step guide to export your models to PMML and deploy them on Hadoop.