HDFS Forum

Permission issues while running mr-jobs

  • #10739

    Hello,
    I have set up a 3 node hadoop cluster (using HDP 1.1). Namenode is being run as hdfs user and the jobtracker is being run as mapred user (both belong to supergroup). I am currently trying to run a MR job as root and it gives me the following error :

    2012-10-09 10:06:31,233 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root cause:org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=READ_EXECUTE, inode=”system”:mapred:supergroup:rwx——

    This error arises when the mr job tries to write into the /tmp/hadoop-mapred/mapred/system directory. This directory has permission 700. To overcome this error I tried setting permissions of this directory to 700. But the hadoop framework keeps complaining and tells me that I have to set the permission of this directory back to 700.

    hadoop dfs -ls /tmp/hadoop-mapred/mapred/
    Found 2 items
    drwxrwxrwx – root supergroup 0 2012-10-09 10:03 /tmp/hadoop-mapred/mapred/staging
    drwx—— – mapred supergroup 0 2012-10-09 10:03 /tmp/hadoop-mapred/mapred/system

    Please let me know what I’m doing wrong here. Can’t seem to run a map reduce job as root.

    Thanks,
    Aishwarya

to create new topics or reply. | New User Registration

  • Author
    Replies
  • #10745
    Sasha J
    Moderator

    In general, your should not run anything as root.
    run your job as mapred used and it will be executed normally.
    directory /tmp/hadoop-mapred/mapred/system is used by JobTracker and no other users should touch it…

    #10746

    Yes running the mapred job as the same user who started jobtracker (mapred in this case) works. But we want any user to be able to run jobs on this MR cluster. Is that not possible with HDP 1.1 ? Also was there some permission related changes in HDP 1.1 ?

    #10747
    Sasha J
    Moderator

    Any user should be able to execute mapreduce jobs.
    it should have it’s own “home” directory in HDFS.
    For example:

    [root@node ~]# hadoop fs -ls /user
    Found 5 items
    drwxrwx— – ambari_qa hdfs 0 2012-10-04 14:06 /user/ambari_qa
    drwxr-xr-x – hdfs hdfs 0 2012-10-09 11:47 /user/hdfs
    drwx—— – hive hdfs 0 2012-10-09 11:49 /user/hive
    drwxrwx— – oozie hdfs 0 2012-09-11 18:28 /user/oozie
    drwxr-xr-x – templeton hdfs 0 2012-09-11 18:35 /user/templeton

    [root@node ~]# su – hbase -c “hadoop jar /usr/lib/hadoop/hadoop-examples.jar pi 5 5″
    Number of Maps = 5
    Samples per Map = 5
    org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=hbase, access=WRITE, inode=”user”:hdfs:hdfs:rwxr-xr-x
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

    [root@node ~]# su – hdfs -c “hadoop fs -mkdir /user/hbase”

    [root@node ~]# su – hdfs -c “hadoop fs -chown -R hbase /user/hbase”
    [root@node ~]# hadoop fs -ls /user
    Found 6 items
    drwxrwx— – ambari_qa hdfs 0 2012-10-04 14:06 /user/ambari_qa
    drwx—— – hbase hdfs 0 2012-10-09 11:51 /user/hbase
    drwxr-xr-x – hdfs hdfs 0 2012-10-09 11:47 /user/hdfs
    drwx—— – hive hdfs 0 2012-10-09 11:49 /user/hive
    drwxrwx— – oozie hdfs 0 2012-09-11 18:28 /user/oozie
    drwxr-xr-x – templeton hdfs 0 2012-09-11 18:35 /user/templeton
    [root@node ~]# su – hbase -c “hadoop jar /usr/lib/hadoop/hadoop-examples.jar pi 5 5″
    Number of Maps = 5
    Samples per Map = 5
    Wrote input for Map #0
    Wrote input for Map #1
    Wrote input for Map #2
    Wrote input for Map #3
    Wrote input for Map #4
    Starting Job
    12/10/09 11:51:46 INFO mapred.FileInputFormat: Total input paths to process : 5
    12/10/09 11:51:47 INFO mapred.JobClient: Running job: job_201210091143_0003
    12/10/09 11:51:48 INFO mapred.JobClient: map 0% reduce 0%
    12/10/09 11:51:54 INFO mapred.JobClient: map 40% reduce 0%
    12/10/09 11:51:57 INFO mapred.JobClient: map 60% reduce 0%
    12/10/09 11:51:59 INFO mapred.JobClient: map 80% reduce 0%
    12/10/09 11:52:00 INFO mapred.JobClient: map 100% reduce 0%
    12/10/09 11:52:06 INFO mapred.JobClient: map 100% reduce 100%

    #10748

    I created a home directory for user ‘root’

    hadoop dfs -ls /user
    Found 1 items
    drwxr-xr-x – root supergroup 0 2012-10-09 12:04 /user/root

    But when I run the job I still get the same error.

    hadoop jar /usr/lib/hadoop/hadoop-examples.jar pi 5 5
    Number of Maps = 5
    Samples per Map = 5
    Wrote input for Map #0
    Wrote input for Map #1
    Wrote input for Map #2
    Wrote input for Map #3
    Wrote input for Map #4
    Starting Job
    12/10/09 12:06:33 INFO mapred.FileInputFormat: Total input paths to process : 5
    12/10/09 12:06:33 INFO mapred.JobClient: Running job: job_201210091002_0003
    12/10/09 12:06:34 INFO mapred.JobClient: map 0% reduce 0%
    12/10/09 12:06:34 INFO mapred.JobClient: Job complete: job_201210091002_0003
    12/10/09 12:06:34 INFO mapred.JobClient: Counters: 0
    12/10/09 12:06:34 INFO mapred.JobClient: Job Failed: Job initialization failed:
    org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=EXECUTE, inode=”system”:mapred:supergroup:rwx——

    The namenode logs indicate the following:

    2012-10-09 12:06:33,395 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 9800, call create(/tmp/hadoop-mapred/mapred/system/job_201210091002_0003/jobToken, rw
    xr-xr-x, DFSClient_728313058, true, true, 3, 134217728) from 10.1.1.1:55198: error: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, acc
    ess=EXECUTE, inode=”system”:mapred:supergroup:rwx——
    org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=EXECUTE, inode=”system”:mapred:supergroup:rwx——

    Do I also have to fix configuration settings elsewhere ?

    #10749
    Sasha J
    Moderator

    Works fine for me…

    [root@node ~]# su – hdfs -c “hadoop fs -mkdir /user/root”
    [root@node ~]# su – hdfs -c “hadoop fs -chown -R root /user/root”
    [root@node ~]# hadoop jar /usr/lib/hadoop/hadoop-examples.jar pi 5 5
    Number of Maps = 5
    Samples per Map = 5
    Wrote input for Map #0
    Wrote input for Map #1
    Wrote input for Map #2
    Wrote input for Map #3
    Wrote input for Map #4
    Starting Job
    12/10/09 12:44:12 INFO mapred.FileInputFormat: Total input paths to process : 5
    12/10/09 12:44:12 INFO mapred.JobClient: Running job: job_201210091143_0004
    12/10/09 12:44:13 INFO mapred.JobClient: map 0% reduce 0%
    12/10/09 12:44:19 INFO mapred.JobClient: map 40% reduce 0%
    12/10/09 12:44:22 INFO mapred.JobClient: map 60% reduce 0%
    12/10/09 12:44:23 INFO mapred.JobClient: map 80% reduce 0%
    12/10/09 12:44:25 INFO mapred.JobClient: map 100% reduce 0%
    12/10/09 12:44:30 INFO mapred.JobClient: map 100% reduce 100%
    12/10/09 12:44:31 INFO mapred.JobClient: Job complete: job_201210091143_0004
    12/10/09 12:44:31 INFO mapred.JobClient: Counters: 30
    12/10/09 12:44:31 INFO mapred.JobClient: Job Counters
    12/10/09 12:44:31 INFO mapred.JobClient: Launched reduce tasks=1
    12/10/09 12:44:31 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=18142
    12/10/09 12:44:31 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
    12/10/09 12:44:31 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
    12/10/09 12:44:31 INFO mapred.JobClient: Launched map tasks=5
    12/10/09 12:44:31 INFO mapred.JobClient: Data-local map tasks=5
    12/10/09 12:44:31 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=11352
    12/10/09 12:44:31 INFO mapred.JobClient: File Input Format Counters
    12/10/09 12:44:31 INFO mapred.JobClient: Bytes Read=590
    12/10/09 12:44:31 INFO mapred.JobClient: File Output Format Counters
    12/10/09 12:44:31 INFO mapred.JobClient: Bytes Written=97
    12/10/09 12:44:31 INFO mapred.JobClient: FileSystemCounters
    12/10/09 12:44:31 INFO mapred.JobClient: FILE_BYTES_READ=65
    12/10/09 12:44:31 INFO mapred.JobClient: HDFS_BYTES_READ=1170
    12/10/09 12:44:31 INFO mapred.JobClient: FILE_BYTES_WRITTEN=161531
    12/10/09 12:44:31 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=215
    12/10/09 12:44:31 INFO mapred.JobClient: Map-Reduce Framework
    12/10/09 12:44:31 INFO mapred.JobClient: Map output materialized bytes=175
    12/10/09 12:44:31 INFO mapred.JobClient: Map input records=5
    12/10/09 12:44:31 INFO mapred.JobClient: Reduce shuffle bytes=175
    12/10/09 12:44:31 INFO mapred.JobClient: Spilled Records=20
    12/10/09 12:44:31 INFO mapred.JobClient: Map output bytes=90
    12/10/09 12:44:31 INFO mapred.JobClient: Total committed heap usage (bytes)=1200553984
    12/10/09 12:44:31 INFO mapred.JobClient: CPU time spent (ms)=3410
    12/10/09 12:44:31 INFO mapred.JobClient: Map input bytes=120
    12/10/09 12:44:31 INFO mapred.JobClient: SPLIT_RAW_BYTES=580
    12/10/09 12:44:31 INFO mapred.JobClient: Combine input records=0
    12/10/09 12:44:31 INFO mapred.JobClient: Reduce input records=10
    12/10/09 12:44:31 INFO mapred.JobClient: Reduce input groups=10
    12/10/09 12:44:31 INFO mapred.JobClient: Combine output records=0
    12/10/09 12:44:31 INFO mapred.JobClient: Physical memory (bytes) snapshot=742219776
    12/10/09 12:44:31 INFO mapred.JobClient: Reduce output records=0
    12/10/09 12:44:31 INFO mapred.JobClient: Virtual memory (bytes) snapshot=6451920896
    12/10/09 12:44:31 INFO mapred.JobClient: Map output records=10
    Job Finished in 19.425 seconds
    Estimated value of Pi is 3.68000000000000000000
    [root@node ~]#

    What is your hdfs-site.xml, mapped-site.xml, core-site.xml?
    As you can see, you job trying to use /tmp/hadoop-mapred/mapred/system, which it should not.
    Did you make any changes in the configuration files?

    #10750

    Is there some way to print the contents of mapred.system.dir ? I have a feeling this is set to /tmp/hadoop-mapred/mapred/system currently.

    I am listing the contents of the config files :

    hdfs_site.xml

    dfs.block.size
    134217728

    dfs.name.dir
    /state/partition1/apache-hdfs/namedir

    dfs.hosts.exclude
    /usr/lib/hadoop/conf/apache-hdfs/hdfs/dfs.exclude

    dfs.secondary.http.address
    aishdev.local:50090

    dfs.http.address
    aishdev.local:50070

    dfs.name.dir
    /state/partition1/apache-hdfs/snamedir

    dfs.block.size
    134217728

    core-site.xml

    fs.default.name
    hdfs://aishdev.local:9800/

    topology.script.file.name
    /opt/rocks/bin/hadoop-topology

    io.file.buffer.size
    131072

    io.file.buffer.size
    131072

    mapred-site.xml

    mapred.job.tracker
    aishdev:9801

    io.sort.mb
    160

    io.sort.spill.percent
    1.0

    io.sort.factor
    100

    mapred.child.java.opts
    -Xmx1g

    mapred.jobtracker.taskScheduler
    org.apache.hadoop.mapred.FairScheduler

    io.sort.record.percent
    0.138

    #10751
    Sasha J
    Moderator

    It should be in mapred-site.xml, it set by default during the installation.

    mapred.system.dir
    /mapred/system
    No description
    true

    You can check current settings using JobTracker UI.
    How did you set it up?
    it is very strange that it is missed, HDP installers always put it to the place…

    #10755

    From the JobTracker UI (thanks for this !) I see the following values :
    name value
    hadoop.tmp.dir /tmp/hadoop-${user.name}
    mapred.system.dir ${hadoop.tmp.dir}/mapred/system

    I later tried changing hadoop.tmp.dir in core-site.xml to ‘/user/root’ (a directory owned by user) and restarted jobtracker, but I still get the same error. I am not able to affect the ‘hadoop.tmp.dir’ property. However if I change other properties in the file, it gets updated in the job tracker UI.

    #10756
    Sasha J
    Moderator

    How did you install HDP?
    what installation method you use?

    And it seems like those values are related to the LOCAL directories, which stores temporary files.
    can you check permissions for /tmp on you node(s)

    #10757

    Apologies I restarted both the jobtracker and the namenode again and it is now able to pick up the properties.

    hadoop.tmp.dir /user/root
    mapred.system.dir ${hadoop.tmp.dir}/mapred/system

    But I still get the same error. Hadoop tries to write to the same tmp dir. Namenode error message is as follows (from logs) :

    2012-10-09 14:50:37,399 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 9800, call create(/tmp/hadoop-mapred/mapred/system/job_201210091002_0017/jobToken, rwxr-xr-x, DFSClient_728313058, true, true, 3, 134217728) from 10.1.1.1:56744: error: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=EXECUTE, inode=”system”:mapred:supergroup:rwx——

    The above lines should actually read /user/root/mapred/system/job_201210091002_0017 instead of /tmp/hadoop-mapred/mapred/system/job_201210091002_0017 since the mapred.system.dir is now /user/root/mapred/sytem. What am I doing wrong here ?

    #10758

    I installed HDP from the rpm.

    Permissions on /tmp is as follows :

    hadoop dfs -ls /tmp
    Found 1 items
    drwxrwxrwx – mapred supergroup 0 2012-10-09 14:23 /tmp/hadoop-mapred

    #10759
    Sasha J
    Moderator

    This means that you made some mistakes on configuration setup…

    Make sure your /tmp directory have the following permissions and ownership:

    drwxrwxrwt. 40 root root 4096 Oct 9 14:58 tmp

    Also, settings should be as following:

    hadoop.tmp.dir /tmp/hadoop-${user.name}
    mapred.system.dir /mapred/system

    mapred.system.dir is HDFS location. hadoop.tmp.dir is LOCAL node location…

    #10760

    The default value for mapred.system.dir is {hadoop.tmp.dir}/mapred/system and default value for hadoop.tmp.dir is /tmp/hadoop-${user.name}. I removed the custom values that I added.

    Also , the hadoop.tmp.dir seems to be created when the jobtracker is started. SInce the jobtracker is started by user ‘mapred’, the jobtracker creates /tmp/hadoop-mapred in the HDFS filesystem and not local filesystem. I did create a file in the local filesystem now however.

    ls /tmp
    drwxr-xr-x 2 root root 4096 Oct 9 15:38 hadoop-root

    Things still dont work.

    #10761
    Sasha J
    Moderator

    Default values are:

    hadoop.tmp.dir /tmp/hadoop-${user.name}
    mapred.system.dir /mapred/system

    I do not have hadoop.tmp.dir defined in my configuration, mapred.system.dir defined as /mapred/system.
    And this works as a charm.
    Please, remove hadoop.tmp.dir from you configuration and restart both MapReduce and HDFS.

    #10762

    I removed the hadoop.tmp.dir from my config and inserted the following property, value pair in core-site.xml.
    mapred.system.dir ${hadoop.tmp.dir}/mapred/system

    I restarted both the namenode and jobtracker after this (about two times). When I viewed the config file via JobTracker UI, I don;’t see the changes reflecting.

    mapred.system.dir ${hadoop.tmp.dir}/mapred/system

    As an aside, I really thank you for your prompt replies :)

    #10763

    Sorry typo, the mapred.system.dir property I inserted (in core-site.xml) is as follows :

    mapred.system.dir
    /mapred/system

    #10764
    Sasha J
    Moderator

    by default, it goes to mapred-site.xml.
    Syntax is:

    mapred.system.dir
    /mapred/system
    No description
    true

    #10766
    Sasha J
    Moderator

    Sorry, forum eating all XML tags….

    #10770
    Sasha J
    Moderator

    I did remove all from the code…

    property
    name mapred.system.dir /name
    value /mapred/system /value
    description No description /description
    final true /final
    /property

    #10771
    Sasha J
    Moderator

    Please, put it back to the place

    #10772

    Sweet I see the mapred.system.dir in the JobConfig UI now.
    mapred.system.dir /mapred/system

    But job is not running. Still giving the same error.

    2012-10-09 16:35:49,303 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9800, call create(/tmp/hadoop-mapred/mapred/system/job_201210091002_0024/jobToken, rwxr-xr-x, DFSClient_728313058, true, true, 3, 134217728) from 10.1.1.1:57953: error: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=EXECUTE, inode=”job_201210091002_0024″:mapred:supergroup:rwx——

    The mapred.system.dir seems to be getting overriden by /tmp/hadoop-mapred/mapred/system/

    #10773
    Sasha J
    Moderator

    Did you apply change all across the cluster nodes?
    Make sure you have all configuration files consistent on all nodes in the cluster.
    And then you have to restart all tasktrackers and job tracker.
    By the way, what was the reason to install from RPMs?
    Why didn’t you use HMC installation?

    #10774

    Looks like I did not apply changes to all nodes. It seems to write to /mapred/system now. But the error still persists.

    2012-10-09 19:18:53,528 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root cause:org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=EXECUTE, inode=”system”:hdfs:supergroup:rwx——
    2012-10-09 19:18:53,529 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 9800, call create(/mapred/system/job_201210091918_0001/jobToken, rwxr-xr-x, DFSClient_-288163603, true, true, 3, 134217728) from 10.1.1.1:58981: error: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=EXECUTE, inode=”system”:hdfs:supergroup:rwx——

    #10779
    Sasha J
    Moderator

    What is the exact command line you use to start your job?
    Is there any specific configuration you use inside the java code?
    can we see name node log file, datanodes log file, tasktrackers log files and jobtracker log file?

    #10780
    Sasha J
    Moderator

    Why didn’t you use HMC for installation?

    #10781

    I’m simply trying to run ‘hadoop jar /usr/lib/hadoop/hadoop-examples.jar pi 5 5′.

    Namenode logs:
    ‘2012-10-10 11:45:26,858 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 9800, call create(/mapred/sytem/job_201210101133_0003/jobToken, rwxr-xr-x, DFSClient_463505111, true, true, 3, 134217728) from 10.1.1.1:36279: error: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=EXECUTE, inode=”sytem”:hdfs:supergroup:rwx——
    org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=EXECUTE, inode=”sytem”:hdfs:supergroup:rwx——
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:199)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:155)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:125)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5308)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5282)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1259)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1211)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:605)
    at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1122)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

    Tasktracker log:
    2/10/10 11:45:27 INFO mapred.JobClient: Job Failed: Job initialization failed:
    org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=EXECUTE, inode=”sytem”:hdfs:supergroup:rwx——
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
    at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClient.java:3271)
    at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:733)
    at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:193)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:555)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:536)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:443)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:435)
    at org.apache.hadoop.security.Credentials.writeTokenStorageFile(Credentials.java:169)
    at org.apache.hadoop.mapred.JobInProgress.generateAndStoreTokens(JobInProgress.java:3537)
    at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:696)
    at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3981)
    at org.apache.hadoop.mapred.FairScheduler$JobInitializer$InitJob.run(FairScheduler.java:291)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)

    Can you please send me your job.xml file ? I have removed all previous parameters that you mentioned. I only have ‘mapred.system.dir’ set to /mapred/system.

    su hdfs -c ‘hadoop dfs -ls /mapred’
    Found 1 items
    drwx—— – hdfs hadoop 0 2012-10-10 12:05 /mapred/system

    #10782

    Can you please point me to a page containing step by step installation guide for installation from RPM ? Similar to https://ccp.cloudera.com/display/CDHDOC/CDH3+Deployment+on+a+Cluster#CDH3DeploymentonaCluster-Step3%3ACreateandConfigurethe%7B%7Bmapred.system.dir%7D%7DDirectoryinHDFS (Customizing the Configuration without Using a Configuration Package)

    #10783
    Sasha J
    Moderator

    It looks like your HDFS ownership is not correct…
    You created all those folders manually, right?

    This is what I have in the working system:

    [root@node ~]# hadoop dfs -ls /
    Found 4 items
    drwxr-xr-x – hdfs hdfs 0 2012-10-09 11:45 /apps
    drwx—— – mapred hdfs 0 2012-10-09 14:58 /mapred
    drwxrwxrwx – hdfs hdfs 0 2012-10-04 17:58 /tmp
    drwxr-xr-x – hdfs hdfs 0 2012-10-09 12:44 /user
    [root@node ~]# su – hdfs -c “hadoop dfs -ls /mapred”
    Found 2 items
    drwx—— – mapred hdfs 0 2012-10-09 11:47 /mapred/history
    drwx—— – mapred hdfs 0 2012-10-09 14:58 /mapred/system
    [root@node ~]#

    Which job.xml are you talking about?

    #10784

    The job.xml configuration file that I can view via the JobTracker UI. Can you please send it over ?

    #10785

    No I not create these folders manually. But I did it now (after HDFS formatting and restarting the services). I am not able to start the jobtracker however.

    2012-10-10 12:22:46,085 WARN org.apache.hadoop.mapred.JobTracker: Bailing out …
    org.apache.hadoop.security.AccessControlException: The systemdir hdfs://aishdev.local:9800/mapred/system is not owned by hdfs
    at org.apache.hadoop.mapred.JobTracker.initialize(JobTracker.java:1946)
    at org.apache.hadoop.mapred.JobTracker.offerService(JobTracker.java:2314)
    at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4766)
    2012-10-10 12:22:46,086 FATAL org.apache.hadoop.mapred.JobTracker: org.apache.hadoop.security.AccessControlException: The systemdir hdfs://foo:9800/mapred/system is not owned by hdfs
    at org.apache.hadoop.mapred.JobTracker.initialize(JobTracker.java:1946)
    at org.apache.hadoop.mapred.JobTracker.offerService(JobTracker.java:2314)
    at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4766)

    2012-10-10 12:22:46,094 INFO org.apache.hadoop.mapred.JobTracker: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down JobTracker at aishdev.stacki.com/192.168.2.142

    #10786
    Sasha J
    Moderator

    The document you requested in a previous post is here:
    http://docs.hortonworks.com/CURRENT/Deploying_Hortonworks_Data_Platform/Using_gsInstaller/System_Requirements_For_Test_And_Production_Clusters.htm

    In regards to your permissions, can you execute :
    su – hdfs -c “hadoop dfs -ls /mapred”

    and provide the output.

    #10787

    su hdfs -c “hadoop dfs -ls /mapred”
    Found 1 items
    drwx—— – mapred hdfs 0 2012-10-10 12:21 /mapred/system

    #10788

    Should all users who run mapred jobs belong to a particular group ? Also can I take a look at your job.xml (the one that you get from JobTracker UI) file please ?

    #10789

    Say I have a directory called /user/foo in hdfs, owned by user=foo (This was done by running chown -R). When a job runs and creates a file / directory(newdir) inside /user/foo, the owner of /user/foo/newdir becomes hdfs user (same user who started the namenode). Should n’t this newdir also be owned by user=foo ? But this does nt seem to be the case with HDP 1.1.

    #10790

    Okay this works if we disable the FairScheduler. This is the only thing that we are doing differently from you. Please turn on the FairScheduler by setting ‘mapred.jobtracker.taskScheduler’ to ‘org.apache.hadoop.mapred.FairScheduler’ and you will see that it does nt work.

    We also found this https://issues.apache.org/jira/browse/MAPREDUCE-4398
    Looks like this is already a bug.

    #10791
    Sasha J
    Moderator

    OK, this explain the things.
    This is known problem with FairScheduler, it works incorrectly.
    As you see, bug is opened in Apache.

    As HDP provides it’s own configuration files, we have FailScheduler disabled by default, because of this bug.
    As you put your own configuration, you hit the problem.
    It all works OK if you use correct configuration with no FairScheduler:

    [test@node ~]$ hadoop jar /usr/lib/hadoop/hadoop-examples.jar teragen 10000 in
    Generating 10000 using 2 maps with step of 5000
    12/10/10 17:08:25 INFO mapred.JobClient: Running job: job_201210101659_0002
    12/10/10 17:08:26 INFO mapred.JobClient: map 0% reduce 0%
    12/10/10 17:08:34 INFO mapred.JobClient: map 100% reduce 0%
    12/10/10 17:08:34 INFO mapred.JobClient: Job complete: job_201210101659_0002
    12/10/10 17:08:34 INFO mapred.JobClient: Counters: 19
    12/10/10 17:08:34 INFO mapred.JobClient: Job Counters
    12/10/10 17:08:34 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=9808
    12/10/10 17:08:34 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
    12/10/10 17:08:34 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
    12/10/10 17:08:34 INFO mapred.JobClient: Launched map tasks=2
    12/10/10 17:08:34 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
    12/10/10 17:08:34 INFO mapred.JobClient: File Input Format Counters
    12/10/10 17:08:34 INFO mapred.JobClient: Bytes Read=0
    12/10/10 17:08:34 INFO mapred.JobClient: File Output Format Counters
    12/10/10 17:08:34 INFO mapred.JobClient: Bytes Written=1000000
    12/10/10 17:08:34 INFO mapred.JobClient: FileSystemCounters
    12/10/10 17:08:34 INFO mapred.JobClient: HDFS_BYTES_READ=164
    12/10/10 17:08:34 INFO mapred.JobClient: FILE_BYTES_WRITTEN=53160
    12/10/10 17:08:34 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1000000
    12/10/10 17:08:34 INFO mapred.JobClient: Map-Reduce Framework
    12/10/10 17:08:34 INFO mapred.JobClient: Map input records=10000
    12/10/10 17:08:34 INFO mapred.JobClient: Physical memory (bytes) snapshot=141197312
    12/10/10 17:08:34 INFO mapred.JobClient: Spilled Records=0
    12/10/10 17:08:34 INFO mapred.JobClient: CPU time spent (ms)=900
    12/10/10 17:08:34 INFO mapred.JobClient: Total committed heap usage (bytes)=120455168
    12/10/10 17:08:34 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2122698752
    12/10/10 17:08:34 INFO mapred.JobClient: Map input bytes=10000
    12/10/10 17:08:34 INFO mapred.JobClient: Map output records=10000
    12/10/10 17:08:34 INFO mapred.JobClient: SPLIT_RAW_BYTES=164
    [test@node ~]$ hadoop fs -ls /user
    Found 8 items
    drwxrwx— – ambari_qa hdfs 0 2012-10-09 17:00 /user/ambari_qa
    drwx—— – hbase hdfs 0 2012-10-09 11:52 /user/hbase
    drwxr-xr-x – hdfs hdfs 0 2012-10-09 11:47 /user/hdfs
    drwx—— – hive hdfs 0 2012-10-09 17:00 /user/hive
    drwxrwx— – oozie hdfs 0 2012-09-11 18:28 /user/oozie
    drwx—— – root hdfs 0 2012-10-09 12:44 /user/root
    drwxr-xr-x – templeton hdfs 0 2012-09-11 18:35 /user/templeton
    drwx—— – test hdfs 0 2012-10-10 17:08 /user/test
    [test@node ~]$ hadoop fs -ls /user/test
    Found 2 items
    drwx—— – test hdfs 0 2012-10-10 17:08 /user/test/.staging
    drwx—— – test hdfs 0 2012-10-10 17:08 /user/test/in
    [test@node ~]$

The topic ‘Permission issues while running mr-jobs’ is closed to new replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.