Hortonworks Sandbox Forum

Change default Block Size

  • #16055
    zero quake

    Is it possible to change the default block size in hadoop in sandbox , and also is it possible to specify block size for a task.

to create new topics or reply. | New User Registration

  • Author
  • #16112
    Larry Liu

    Hi, Zero

    The parameter “mapred.max.split.size” which can be set per job individually is what you looking for. If you want to change default block size, I think you can do it as well. Try to edit hdfs-site.xml for property dfs.block.size.



    zero quake

    SET mapred.min.split.size 20000;
    SET mapred.max.split.size 50000;
    SET pig.SplitCombination false;
    SET default_parallel 30;
    SET dfs.block.size 8388608;

    tried these , number of maps dint change , tried vi into the hdfs-site.xml for some reason ,it doesnt allow me to properly edit it , can we ssh into the vm from outside ?? and try it out ? any other solutions ?

    THank you hortonworks for the sandbox , i learnt lot more through that than 2 days of reading random tutorials.

    Yi Zhang

    Hi Zero Quake,

    How are you running the job? Is it a hadoop jar job, or hive, pig, etc?
    For example, if you are running a hadoop jar job, you can put all the job-specific configuration in your-job-specific-conf file, then use the -conf to load it; Alternatively you can use -D to define each property:

    ‘hadoop jar testjob.jar -conf your-job-specific-file -D mapred.map.tasks=xx ….’

    A helpful link here about the number of map/reduce tasks:

    If the sandbox vm has an ip that is accessible, then you can ssh into it. You can check or edit its network configurations to suite your needs.

    Hope this helps,

    zero quake

    running it using the pig interface and yes i have changed the default block size in hadoop-site.xml which is located in /etc/hadoop/conf folder and it worked , my bad i was trying it out one same table without creating a new one again.newbie issues 😛 , thank you :)

    Yi Zhang

    Hi Zero,

    Good that you figured it out. The fun of hadoop is to play with it!


You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.