Note: this tutorial was validated with Sandbox 1.3
Cascading Pattern is a machine learning project within the Cascading development framework used to build enterprise data workflows. The Cascading framework provides an abstraction layer on top of Hadoop and other computing topologies. It allows enterprises to leverage existing skills and resources to build data processing applications on Apache Hadoop, without specialized Hadoop skills. Pattern, in particular, leverages an industry standard called Predictive Model Markup Language (PMML), which allows data scientists to leverage their favorite statistical & analytics tools such as R, Oracle, etc., to export predictive models and quickly run them on data sets stored in Hadoop. Pattern’s benefits include reduced development costs, time savings and reduced licensing issues at scale – all while leveraging Hadoop clusters, core competencies of analytics staff, and existing intellectual property in the predictive models.
- Get the Hortonworks Sandbox
- Install JDK 1.6 (details in tutorial)
- Install Gradle (details in tutorial)
- Install R and R-Studio
- Review the tutorial
Data Scientists or Data Analysts with intermediate experience with statistical modeling tools like SAS, R, Microstrategies and novice Java experience.
Try this tutorial with :
These tutorials are designed to work with Sandbox, a simple and easy to get started with Hadoop. Sandbox offers a full HDP environment that runs in a virtual machine.