Hadoop R Streaming

to create new topics or reply. | New User Registration

Tagged: , ,

This topic contains 3 replies, has 4 voices, and was last updated by  cubetoon 1 year, 2 months ago.

  • Creator
    Topic
  • #22536

    Yang Song
    Member

    I have a question about the Hadoop streaming. In this version HDP for windows, could I use streaming method to execute R script to do mapreduce job? If I could, how could I do this and what the command should I use?

Viewing 3 replies - 1 through 3 (of 3 total)

You must be to reply to this topic. | Create Account

  • Author
    Replies
  • #53197

    cubetoon
    Participant

    I have worked through a similar challenge lately. I have documented the idiosyncrasies related to HDP on windows on my blog:

    http://www.cubetoon.com/2014/hadoop-streaming-r-on-hortonworks-windows-distribution/

    Collapse
    #22562

    Paul Codding
    Participant

    Hi Yang,

    Your best option for using R for MapReduce would be to use the RHadoop packages available on Github. These packages are created by RevolutionAnalytics and provide a lot of useful tools for interacting with MapReduce, HDFS, and HBase using R. You can find them here:

    https://github.com/RevolutionAnalytics/RHadoop/wiki

    Collapse
    #22560

    Sasha J
    Moderator

    Yang,
    Thank you for using HDP!
    You definitely can use streaming for executing R scripts.
    However, I do not recall how to do this from the top of my head, need to do some research.

    Thank you!
    Sasha

    Collapse
Viewing 3 replies - 1 through 3 (of 3 total)
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.