Home Forums Hortonworks Sandbox Change default Block Size

This topic contains 5 replies, has 3 voices, and was last updated by  Yi Zhang 1 year, 6 months ago.

  • Creator
    Topic
  • #16055

    zero quake
    Member

    Hello,
    Is it possible to change the default block size in hadoop in sandbox , and also is it possible to specify block size for a task.

Viewing 5 replies - 1 through 5 (of 5 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #16130

    Yi Zhang
    Moderator

    Hi Zero,

    Good that you figured it out. The fun of hadoop is to play with it!

    Yi

    Collapse
    #16129

    zero quake
    Member

    running it using the pig interface and yes i have changed the default block size in hadoop-site.xml which is located in /etc/hadoop/conf folder and it worked , my bad i was trying it out one same table without creating a new one again.newbie issues :P , thank you :)

    Collapse
    #16125

    Yi Zhang
    Moderator

    Hi Zero Quake,

    How are you running the job? Is it a hadoop jar job, or hive, pig, etc?
    For example, if you are running a hadoop jar job, you can put all the job-specific configuration in your-job-specific-conf file, then use the -conf to load it; Alternatively you can use -D to define each property:

    ‘hadoop jar testjob.jar -conf your-job-specific-file -D mapred.map.tasks=xx ….’

    A helpful link here about the number of map/reduce tasks:

    http://wiki.apache.org/hadoop/HowManyMapsAndReduces

    If the sandbox vm has an ip that is accessible, then you can ssh into it. You can check or edit its network configurations to suite your needs.

    Hope this helps,
    Yi

    Collapse
    #16122

    zero quake
    Member

    SET mapred.min.split.size 20000;
    SET mapred.max.split.size 50000;
    SET pig.SplitCombination false;
    SET default_parallel 30;
    SET dfs.block.size 8388608;

    tried these , number of maps dint change , tried vi into the hdfs-site.xml for some reason ,it doesnt allow me to properly edit it , can we ssh into the vm from outside ?? and try it out ? any other solutions ?

    THank you hortonworks for the sandbox , i learnt lot more through that than 2 days of reading random tutorials.

    Collapse
    #16112

    Larry Liu
    Moderator

    Hi, Zero

    The parameter “mapred.max.split.size” which can be set per job individually is what you looking for. If you want to change default block size, I think you can do it as well. Try to edit hdfs-site.xml for property dfs.block.size.

    Thanks

    Larry

    Collapse
Viewing 5 replies - 1 through 5 (of 5 total)