Home Forums HDFS Upload Data /Files on EC2

This topic contains 9 replies, has 2 voices, and was last updated by  tedr 1 year, 3 months ago.

  • Creator
    Topic
  • #27634

    Anupam Gupta
    Participant

    Hi, I have created 2 instances on ec2 and successfully installed HDP 1.2.0 using ambari(automated installation),
    Now i want to Know upload data or files on it how i can do that?

Viewing 9 replies - 1 through 9 (of 9 total)

The topic ‘Upload Data /Files on EC2’ is closed to new replies.

  • Author
    Replies
  • #28046

    tedr
    Moderator

    Hi Agupta,

    yes you can see the files, not their contents, by browsing the files system as you mention. You can also se a listing of the files in hdfs by using the command ‘hadoop fs -ls -R /’, that will give a listing of all the files in HDFS. From the command you said you put the fils in HDFS with it looks like you did not put them in /user/root it looks like you put them in /hadoop/hdfs/data so look for them there.

    Thanks,
    Ted.

    Collapse
    #28032

    Anupam Gupta
    Participant

    Hi Ted,
    Thanks for Your support we have loaded data on HDFS .
    can we see the content of file on ambari using Browse the file system?
    if we can then how (it shows the file is inside /user/root)?

    Thanks

    Collapse
    #28005

    tedr
    Moderator

    Hi Agupta,

    you should have copied the file to /user/root in HDFS not /hadoop/hdfs/data. But now that its there it can be analysed from where it is, but you don’t do that with Ambari. Ambari is for installing, managing and monitoring the cluster, not for analysis of the data. HDP does provide some tools commonly used for analysis of the data these are Hive, Pig and MapReduce. Hive and Pig are essentially scripting languages for data analysis, hive is sql like and Pig is it’s own language. MapReduce is essentially an API for java apps to submit to the cluster. To learn more about which to use I suggest you read Tom White’s excellent book ‘Hadoop the Definitive Guide’. another option for learning how to use hadoop is to download and run our Sandbox.

    Thanks,
    Ted.

    Collapse
    #27987

    Anupam Gupta
    Participant

    Hi
    Now I have a user/root on hdfs
    I have upload data using following command successfully
    hadoop fs -copyFromLocal /usr/local/CACFP_excel-data.xls /hadoop/hdfs/data

    /hadoop/hdfs/data is datanode directory
    Is this right to copy data to datanode?
    If it is right how to analyse that data? Can ambari do this for me?
    Thanks,

    Collapse
    #27932

    tedr
    Moderator

    HI Agupta,

    Files and folders in HDFS are not locatable on the nodes local file system. To see files and folders in hdfs you need to run the command ‘hadoop fs -ls /’ you will need to run this as the user hdfs.

    Thanks,
    Ted.

    Collapse
    #27925

    Anupam Gupta
    Participant

    Hi Ted,
    Still not able to upload dat on HDFS did all the thing you have mentioned
    ip-10-196-61-75 is my master node(namenode and jobtracker).
    when i run
    hadoop fs -mkdir /user/root

    it shows me
    mkdir: cannot create directory /user/root: File exists

    But I have seen there there is no user directory on any node

    -bash-4.1# cd ..
    -bash-4.1# ls
    bin dev hadoop lib lost+found mnt proc sbin srv tmp var
    boot etc home lib64 media opt root selinux sys usr
    -bash-4.1# ssh n1
    Last login: Fri Jun 21 05:51:10 2013 from ip-10-40-93-176.ec2.internal
    [root@ip-10-196-61-75 ~]# cd ..
    [root@ip-10-196-61-75 /]# ls
    bin dev hadoop lib lost+found mnt proc sbin srv tmp var
    boot etc home lib64 media opt root selinux sys usr
    [root@ip-10-196-61-75 /]# cd usr
    [root@ip-10-196-61-75 usr]# cd local
    [root@ip-10-196-61-75 local]# ls
    CACFP_excel-data.xls bin etc games include lib lib64 libexec sbin share src
    [root@ip-10-196-61-75 local]# cd ..
    [root@ip-10-196-61-75 usr]# cd ..
    [root@ip-10-196-61-75 /]# su – hdfs
    -bash-4.1$ hadoop fs -mkdir /user/root
    mkdir: cannot create directory /user/root: File exists
    -bash-4.1$ hadoop fs -chown root:hdfs /user/root
    -bash-4.1$ exit
    logout
    [root@ip-10-196-61-75 /]# hadoop fs -copyFromLocal /usr/local/CACFP_excel-data.xls /hadoop/hdfs/ copyFromLocal: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode=”/”:hdfs:hdfs:drwxr-xr-x

    Please Help
    Thanks

    Collapse
    #27888

    tedr
    Moderator

    Hi Agupta,

    The error you are getting shows that you have not created a directory in hadoop for the user ‘root’. In HDFS the user ‘root’ has no special privileges, it is justa another user to HDFS. To be able to put your file in HDFS execute the following commands on the master node of your cluster:
    1) su – hdfs
    2) hadoop fs -mkdir /user/root
    3) hadoop fs -chown root:hdfs /user/root
    4) exit

    Now you should be able to put the file into hdfs. Please not how you were attempting to put the file in hdfs woudl have put the file in / and that is not a great idea. that is why I mention that you should create the root user directory.

    Thanks,
    Ted.

    Collapse
    #27883

    Anupam Gupta
    Participant

    Hi all,

    I am trying to upload a data file to hadoop(HDP) hdfs on ec2 but not be able to do that. What i have tried so far is,
    to copy the data file from my local machine to the ec2 centos using the following command.

    c:\> pscp -i C:\Hadoop.ppk C:\xyz.xls root@—-.com:/hadoop/xyz.xls

    And after that m trying to putting that file to the /hadoop/hdfs using this command

    hadoop fs -put /hadoop/xyz.xls /hadoop/hdfs

    but m getting the following exception

    put: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode=”/”:hdfs:hdfs:drwxr-xr-x

    i have googled it and found that we have to change the /etc/hadoop-0.20/conf.empty/hdfs-site.xml and set the dfs.permissions value to false as follows.

    dfs.permissions
    false

    and after doing this we need to stop and start the hadoop using bin/stop-all.sh and bin/start-all.sh respectively.

    But after doing all these things i m still getting the same exception as

    put: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode=”/”:hdfs:hdfs:drwxr-xr-x.

    Any help regarding this is highly appriciated. Please help someone.
    Thanks in advance.

    Collapse
    #27667

    tedr
    Moderator

    Hi Agupta,

    the easiest to understand method of getting files to this cluster on ec2 is to:
    1) scp the files from your local box to one of the nodes on the cluster then
    2) use any ‘hadoop fs’ file operations to move them from the local dir into HDFS.

    Thanks,
    Ted.

    Collapse
Viewing 9 replies - 1 through 9 (of 9 total)