Testing mahout

to create new topics or reply. | New User Registration

Tagged: 

This topic contains 0 replies, has 1 voice, and was last updated by  Wyman 1 year, 9 months ago.

  • Creator
    Topic
  • #39516

    Wyman
    Member

    Hi,
    I tried to test our deployment of hdp with mahout using https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example
    I was able to download the Wikipedia data set and load it into hdfs when trying step 4 I get
    D:\user\mahout-0.7.0.1.3.0.0-0380>bin\mahout wikipediaXMLSPlitter -d D:\user\enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
    “Mahout home set C:\hadoop\\mahout-0.7.0.1.3.0.0-0380″
    MAHOUT_JOB: C:\hadoop\\mahout-0.7.0.1.3.0.0-0380\examples\target\mahout-examples
    -0.7.0.1.3.0.0-0380-job.jar
    13/10/07 18:23:07 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSPlitter
    13/10/07 18:23:08 WARN driver.MahoutDriver: No wikipediaXMLSPlitter.props found
    on classpath, will use command-line arguments only
    Unknown program ‘wikipediaXMLSPlitter’ chosen.
    It looks like another jar file is needed.

You must be to reply to this topic. | Create Account

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.