The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

MapReduce Forum

MapReduce issue

  • #29833

    Hi,

    I am new to Hadoop concepts.
    While practicing with one custom MapReduce program, I found the result is not as expected after executing the code on HDFS based file. Please note that when I execute the same program using Unix based file,getting expected result.
    Below are the details of my code.

    MapReduce in java
    ==================

    import java.io.IOException;
    import java.util.*;

    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.conf.*;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.util.*;

    public class WordCount1 {

    public static class Map extends MapReduceBase implements Mapper {
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException {
    String line = value.toString();
    String tokenedZone=null;
    StringTokenizer tokenizer = new StringTokenizer(line);
    while (tokenizer.hasMoreTokens()) {
    tokenedZone=tokenizer.nextToken();
    word.set(tokenedZone);
    output.collect(word, one);
    }
    }
    }

    public static class Reduce extends MapReduceBase implements Reducer {
    public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException {
    int sum = 0;
    int val = 0;
    while (values.hasNext()) {
    val = values.next().get();
    sum += val;
    }
    if(sum>1)
    output.collect(key, new IntWritable(sum));
    }
    }

    public static void main(String[] args) throws Exception {
    JobConf conf = new JobConf();
    conf.setJarByClass(WordCount1.class);
    conf.setJobName(“wordcount1”);

    conf.setOutputKeyClass(Text.class);
    conf.setOutputValueClass(IntWritable.class);

    conf.setMapperClass(Map.class);
    conf.setCombinerClass(Reduce.class);
    conf.setReducerClass(Reduce.class);

    conf.setInputFormat(TextInputFormat.class);
    conf.setOutputFormat(TextOutputFormat.class);

    Path inPath = new Path(args[0]);
    Path outPath = new Path(args[0]);

    FileInputFormat.setInputPaths(conf,inPath );
    FileOutputFormat.setOutputPath(conf, outPath);

    JobClient.runJob(conf);
    }

    }

    input File
    ===========
    test my program
    during test and my hadoop
    your during
    get program

    hadoop generated output file on HDFS file system
    =======================================
    during 2
    my 2
    test 2

    hadoop generated output file on local file system
    =======================================
    during 2
    my 2
    program 2
    test 2

    Please help me on this issue

  • Author
    Replies
  • #29865
    tedr
    Moderator

    HI Mullangi,

    Check that the file in HDFS actually matches the file on the local system and has not been truncated/corrupted in some way.

    Thanks,
    Ted.

    #29933

    Hi Ted,

    This is happening to me even when I changed to another file.

    Thanks,
    Ram

    #29939
    tedr
    Moderator

    Hi Mullangi,

    do the files in hdfs have the same data in them? you can verify this with the command ‘hadoop fs -cat <filename>’ where ‘<filename>’ is the path to the file in hdfs including the file itself.

    Thanks,
    Ted.

    #31670
    Member

    Hello, I have a question, in my program i have one error in JobClient.runJob(conf);
    Ps: I use hadoop-0.19.1
    Thank you

    #33399
    abdelrahman
    Moderator

    Hi,

    Please start with the basic M-R word count program. And alter the configuration as you see fit later. Can you post the complete error.

    http://wiki.apache.org/hadoop/WordCount

    Thanks
    -Abdelrahman

The forum ‘MapReduce’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.