Cascading Pattern

Quickly migrate Predictive Models (PMML) onto Hadoop & deploy them at scale

Note: this tutorial was validated with Sandbox 1.3

Introduction

Cascading Pattern is a machine learning project within the Cascading development framework used to build enterprise data workflows. The Cascading framework provides an abstraction layer on top of Hadoop and other computing topologies. It allows enterprises to leverage existing skills and resources to build data processing applications on Apache Hadoop, without specialized Hadoop skills. Pattern, in particular, leverages an industry standard called Predictive Model Markup Language (PMML), which allows data scientists to leverage their favorite statistical & analytics tools such as R, Oracle, etc., to export predictive models and quickly run them on data sets stored in Hadoop. Pattern’s benefits include reduced development costs, time savings and reduced licensing issues at scale – all while leveraging Hadoop clusters, core competencies of analytics staff, and existing intellectual property in the predictive models.

Get Started

  1. Get the Hortonworks Sandbox
  2. Install JDK 1.6 (details in tutorial)
  3. Install Gradle (details in tutorial)
  4. Install R and R-Studio
  5. Review the tutorial

Target Audience

Data Scientists or Data Analysts with intermediate experience with statistical modeling tools like SAS, R, Microstrategies and novice Java experience.

Additional Resources

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Try this tutorial with :

These tutorials are designed to work with Sandbox, a simple and easy to get started with Hadoop. Sandbox offers a full HDP environment that runs in a virtual machine.

Thank you for subscribing!