Home Forums HDFS HDP 2.0 performance increase

Tagged: 

This topic contains 3 replies, has 2 voices, and was last updated by  Dharanikumar Bodla 7 months, 1 week ago.

  • Creator
    Topic
  • #48676

    Dharanikumar Bodla
    Participant

    hi to all,

    I had a set of 22documnets in text format loaded onto HDFS and running a map/reduce job .It took almost 30-40mins in running the map/reduce job.how do I increase the performance to run jobs as fast as possible 2-3mins.
    hadoop performance depends on yarn memory or map/reduce file or system configuration ,please explain in detail.
    Thanks & regards,
    Bodla Dharani Kumar.

Viewing 3 replies - 1 through 3 (of 3 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #49282

    Dharanikumar Bodla
    Participant

    14/02/26 11:55:27 INFO mapreduce.Job: Job job_1393395841113_0001 running in uber mode : false
    14/02/26 11:55:27 INFO mapreduce.Job: map 0% reduce 0%
    14/02/26 11:56:23 INFO mapreduce.Job: map 1% reduce 0%
    14/02/26 11:56:47 INFO mapreduce.Job: map 2% reduce 0%
    14/02/26 11:57:08 INFO mapreduce.Job: map 3% reduce 0%
    14/02/26 11:57:29 INFO mapreduce.Job: map 4% reduce 0%
    14/02/26 11:57:45 INFO mapreduce.Job: map 5% reduce 0%
    14/02/26 11:58:07 INFO mapreduce.Job: map 6% reduce 0%
    14/02/26 11:58:23 INFO mapreduce.Job: map 7% reduce 0%
    14/02/26 11:58:46 INFO mapreduce.Job: map 8% reduce 0%
    14/02/26 11:59:05 INFO mapreduce.Job: map 9% reduce 0%
    14/02/26 11:59:25 INFO mapreduce.Job: map 10% reduce 0%
    14/02/26 11:59:41 INFO mapreduce.Job: map 11% reduce 0%
    14/02/26 12:00:03 INFO mapreduce.Job: map 12% reduce 0%
    14/02/26 12:00:19 INFO mapreduce.Job: map 13% reduce 0%
    14/02/26 12:00:41 INFO mapreduce.Job: map 14% reduce 0%
    14/02/26 12:00:56 INFO mapreduce.Job: map 15% reduce 0%
    14/02/26 12:01:19 INFO mapreduce.Job: map 16% reduce 0%
    14/02/26 12:01:35 INFO mapreduce.Job: map 17% reduce 0%
    14/02/26 12:01:54 INFO mapreduce.Job: map 18% reduce 0%
    14/02/26 12:02:11 INFO mapreduce.Job: map 19% reduce 0%
    14/02/26 12:02:35 INFO mapreduce.Job: map 20% reduce 0%
    14/02/26 12:02:50 INFO mapreduce.Job: map 21% reduce 0%
    14/02/26 12:03:12 INFO mapreduce.Job: map 22% reduce 0%
    14/02/26 12:03:28 INFO mapreduce.Job: map 23% reduce 0%
    14/02/26 12:03:51 INFO mapreduce.Job: map 24% reduce 0%
    14/02/26 12:04:07 INFO mapreduce.Job: map 25% reduce 0%
    14/02/26 12:04:27 INFO mapreduce.Job: map 26% reduce 0%
    14/02/26 12:04:44 INFO mapreduce.Job: map 27% reduce 0%
    14/02/26 12:05:05 INFO mapreduce.Job: map 28% reduce 0%
    14/02/26 12:05:21 INFO mapreduce.Job: map 29% reduce 0%
    14/02/26 12:05:42 INFO mapreduce.Job: map 30% reduce 0%
    14/02/26 12:05:57 INFO mapreduce.Job: map 31% reduce 0%
    14/02/26 12:06:20 INFO mapreduce.Job: map 32% reduce 0%
    14/02/26 12:06:36 INFO mapreduce.Job: map 33% reduce 0%
    14/02/26 12:06:56 INFO mapreduce.Job: map 34% reduce 0%
    14/02/26 12:07:14 INFO mapreduce.Job: map 35% reduce 0%
    14/02/26 12:07:35 INFO mapreduce.Job: map 36% reduce 0%
    14/02/26 12:07:50 INFO mapreduce.Job: map 37% reduce 0%
    14/02/26 12:08:12 INFO mapreduce.Job: map 38% reduce 0%
    14/02/26 12:08:28 INFO mapreduce.Job: map 39% reduce 0%
    14/02/26 12:08:53 INFO mapreduce.Job: map 40% reduce 0%
    14/02/26 12:09:10 INFO mapreduce.Job: map 41% reduce 0%
    14/02/26 12:09:30 INFO mapreduce.Job: map 42% reduce 0%
    14/02/26 12:09:46 INFO mapreduce.Job: map 43% reduce 0%
    14/02/26 12:10:07 INFO mapreduce.Job: map 44% reduce 0%
    14/02/26 12:10:22 INFO mapreduce.Job: map 45% reduce 0%
    14/02/26 12:10:45 INFO mapreduce.Job: map 46% reduce 0%
    14/02/26 12:11:00 INFO mapreduce.Job: map 47% reduce 0%
    14/02/26 12:11:21 INFO mapreduce.Job: map 48% reduce 0%
    14/02/26 12:11:37 INFO mapreduce.Job: map 49% reduce 0%
    14/02/26 12:11:58 INFO mapreduce.Job: map 50% reduce 0%
    14/02/26 12:12:15 INFO mapreduce.Job: map 51% reduce 0%
    14/02/26 12:12:35 INFO mapreduce.Job: map 52% reduce 0%
    14/02/26 12:12:51 INFO mapreduce.Job: map 53% reduce 0%
    14/02/26 12:13:14 INFO mapreduce.Job: map 54% reduce 0%
    14/02/26 12:13:29 INFO mapreduce.Job: map 55% reduce 0%
    14/02/26 12:13:50 INFO mapreduce.Job: map 56% reduce 0%
    14/02/26 12:14:08 INFO mapreduce.Job: map 57% reduce 0%
    14/02/26 12:14:28 INFO mapreduce.Job: map 58% reduce 0%
    14/02/26 12:14:47 INFO mapreduce.Job: map 59% reduce 0%
    14/02/26 12:15:09 INFO mapreduce.Job: map 60% reduce 0%
    14/02/26 12:15:25 INFO mapreduce.Job: map 61% reduce 0%
    14/02/26 12:15:46 INFO mapreduce.Job: map 62% reduce 0%
    14/02/26 12:16:02 INFO mapreduce.Job: map 63% reduce 0%
    14/02/26 12:16:22 INFO mapreduce.Job: map 64% reduce 0%
    14/02/26 12:16:40 INFO mapreduce.Job: map 65% reduce 0%
    14/02/26 12:17:01 INFO mapreduce.Job: map 66% reduce 0%
    14/02/26 12:17:17 INFO mapreduce.Job: map 67% reduce 0%
    14/02/26 12:17:39 INFO mapreduce.Job: map 68% reduce 0%
    14/02/26 12:17:56 INFO mapreduce.Job: map 69% reduce 0%
    14/02/26 12:18:17 INFO mapreduce.Job: map 70% reduce 0%
    14/02/26 12:18:33 INFO mapreduce.Job: map 71% reduce 0%
    14/02/26 12:18:53 INFO mapreduce.Job: map 72% reduce 0%
    14/02/26 12:19:10 INFO mapreduce.Job: map 73% reduce 0%
    14/02/26 12:19:32 INFO mapreduce.Job: map 74% reduce 0%
    14/02/26 12:19:47 INFO mapreduce.Job: map 75% reduce 0%
    14/02/26 12:20:08 INFO mapreduce.Job: map 76% reduce 0%
    14/02/26 12:20:23 INFO mapreduce.Job: map 77% reduce 0%
    14/02/26 12:20:46 INFO mapreduce.Job: map 78% reduce 0%
    14/02/26 12:21:01 INFO mapreduce.Job: map 79% reduce 0%
    14/02/26 12:21:22 INFO mapreduce.Job: map 80% reduce 0%
    14/02/26 12:21:38 INFO mapreduce.Job: map 81% reduce 0%
    14/02/26 12:21:59 INFO mapreduce.Job: map 82% reduce 0%
    14/02/26 12:22:15 INFO mapreduce.Job: map 83% reduce 0%
    14/02/26 12:22:38 INFO mapreduce.Job: map 84% reduce 0%
    14/02/26 12:22:53 INFO mapreduce.Job: map 85% reduce 0%
    14/02/26 12:23:16 INFO mapreduce.Job: map 86% reduce 0%
    14/02/26 12:23:32 INFO mapreduce.Job: map 87% reduce 0%
    14/02/26 12:23:52 INFO mapreduce.Job: map 88% reduce 0%
    14/02/26 12:24:09 INFO mapreduce.Job: map 89% reduce 0%
    14/02/26 12:24:28 INFO mapreduce.Job: map 90% reduce 0%
    14/02/26 12:24:45 INFO mapreduce.Job: map 91% reduce 0%
    14/02/26 12:25:07 INFO mapreduce.Job: map 92% reduce 0%
    14/02/26 12:25:22 INFO mapreduce.Job: map 93% reduce 0%
    14/02/26 12:25:44 INFO mapreduce.Job: map 94% reduce 0%
    14/02/26 12:25:59 INFO mapreduce.Job: map 95% reduce 0%
    14/02/26 12:26:20 INFO mapreduce.Job: map 96% reduce 0%
    14/02/26 12:26:21 INFO mapreduce.Job: map 96% reduce 32%
    14/02/26 12:26:35 INFO mapreduce.Job: map 97% reduce 32%
    14/02/26 12:26:54 INFO mapreduce.Job: map 98% reduce 32%
    14/02/26 12:26:57 INFO mapreduce.Job: map 98% reduce 33%
    14/02/26 12:27:10 INFO mapreduce.Job: map 99% reduce 33%
    14/02/26 12:27:31 INFO mapreduce.Job: map 100% reduce 33%
    14/02/26 12:27:37 INFO mapreduce.Job: map 100% reduce 67%
    14/02/26 12:27:39 INFO mapreduce.Job: map 100% reduce 100%
    14/02/26 12:27:40 INFO mapreduce.Job: Job job_1393395841113_0001 completed successfully
    14/02/26 12:27:40 INFO mapreduce.Job: Counters: 44
    File System Counters
    FILE: Number of bytes read=4361403
    FILE: Number of bytes written=39133257
    FILE: Number of read operations=0
    FILE: Number of large read operations=0
    FILE: Number of write operations=0
    HDFS: Number of bytes read=3392662
    HDFS: Number of bytes written=416139
    HDFS: Number of read operations=1053
    HDFS: Number of large read operations=0
    HDFS: Number of write operations=2
    Job Counters
    Launched map tasks=350
    Launched reduce tasks=1
    Other local map tasks=2
    Data-local map tasks=348
    Total time spent by all maps in occupied slots (ms)=4139292
    Total time spent by all reduces in occupied slots (ms)=86314
    Map-Reduce Framework
    Map input records=79765
    Map output records=408125
    Map output bytes=3545128
    Map output materialized bytes=4363497
    Input split bytes=35805
    Combine input records=0
    Combine output records=0
    Reduce input groups=35996
    Reduce shuffle bytes=4363497
    Reduce input records=408125
    Reduce output records=35996
    Spilled Records=816250
    Shuffled Maps =350
    Failed Shuffles=0
    Merged Map outputs=350
    GC time elapsed (ms)=10318
    CPU time spent (ms)=273540
    Physical memory (bytes) snapshot=140794765312
    Virtual memory (bytes) snapshot=415248080896
    Total committed heap usage (bytes)=122652459008
    Shuffle Errors
    BAD_ID=0
    CONNECTION=0
    IO_ERROR=0
    WRONG_LENGTH=0
    WRONG_MAP=0
    WRONG_REDUCE=0
    File Input Format Counters
    Bytes Read=3356857
    File Output Format Counters
    Bytes Written=416139
    14/02/26 12:27:40 INFO streaming.StreamJob: Output directory: /home/ambari-qa/6.txt

    Collapse
    #49281

    Dharanikumar Bodla
    Participant

    hi Koelli Mungee,
    Thanks for the reply.Facing same problem .please find the benchmark for the following:

    [hdfs@s ~]$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.2.0.2.0.6.0-76.jar -input hdfs:/apps/*.txt -output /home/ambari-qa/6.txt -mapper /home/coartha/mapper1.py -file /home/coartha/mapper1.py -reducer /home/coartha/reducer.py -file /home/coartha/reducer.py
    14/02/26 11:55:16 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
    packageJobJar: [/home/coartha/mapper1.py, /home/coartha/reducer.py] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.2.0.2.0.6.0-76.jar] /tmp/streamjob3070642236666676050.jar tmpDir=null
    14/02/26 11:55:17 INFO client.RMProxy: Connecting to ResourceManager at s.hadoop/192.168.2.121:8050
    14/02/26 11:55:17 INFO client.RMProxy: Connecting to ResourceManager at s.hadoop/192.168.2.121:8050
    14/02/26 11:55:18 INFO mapred.FileInputFormat: Total input paths to process : 350
    14/02/26 11:55:19 INFO mapreduce.JobSubmitter: number of splits:350
    14/02/26 11:55:19 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
    14/02/26 11:55:19 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
    14/02/26 11:55:19 INFO Configuration.deprecation: mapred.cache.files.filesizes is deprecated. Instead, use mapreduce.job.cache.files.filesizes
    14/02/26 11:55:19 INFO Configuration.deprecation: mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files
    14/02/26 11:55:19 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
    14/02/26 11:55:19 INFO Configuration.deprecation: mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class
    14/02/26 11:55:19 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
    14/02/26 11:55:19 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
    14/02/26 11:55:19 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
    14/02/26 11:55:19 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
    14/02/26 11:55:19 INFO Configuration.deprecation: mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps
    14/02/26 11:55:19 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
    14/02/26 11:55:19 INFO Configuration.deprecation: mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class
    14/02/26 11:55:19 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
    14/02/26 11:55:19 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1393395841113_0001

    Collapse
    #49264

    Koelli Mungee
    Moderator

    Hi Dharani,

    There are a number of factors that can contribute to the performance ofcourse like hardware and load on the machine. Do you have any benchmarks for this test? Have you optimized your number of containers/node and RAM-per-container based on

    http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.9.1/bk_installing_manually_book/content/rpm-chap1-11.html

    It is important to make sure you don’t run too many containers/node as that can cause bottlenecks and slow performance. Let me know if this helps,

    -Koelli

    Collapse
Viewing 3 replies - 1 through 3 (of 3 total)