Hortonworks Sandbox Forum

Sandbox – Pig Basic Tutorial example is nbot working

  • #17798
    Sankarg
    Member

    Hi, I just tried the following pig Basic Tutorial which is not working

    a = LOAD ‘nyse_stocks’ USING org.apache.hcatalog.pig.HCatLoader();
    b = FILTER a BY stock_symbol == ‘IBM';
    c = group b all;
    d = FOREACH c GENERATE AVG(b.stock_volume);
    dump d;

    when i tried the syntax check, the following logs captured.

    013-03-17 14:35:28,456 [main] INFO org.apache.pig.Main – Apache Pig version 0.10.1.21 (rexported) compiled Jan 10 2013, 04:00:42
    2013-03-17 14:35:28,459 [main] INFO org.apache.pig.Main – Logging error messages to: /home/sandbox/hue/pig_1363556128447.log
    2013-03-17 14:35:41,945 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to hadoop file system at: file:///
    2013-03-17 14:35:45,555 [main] ERROR org.apache.pig.tools.grunt.Grunt – ERROR 1070: Could not resolve org.apache.hcatalog.pig.HCatLoader using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]
    Details at logfile: /home/sandbox/hue/pig_1363556128447.log

    please do the needful to resolve this issue. Thank you!

    Regards,
    Sankar

to create new topics or reply. | New User Registration

  • Author
    Replies
  • #17884
    tedr
    Moderator

    Hi Sankar,

    Thanks for trying the Hortonworks Sandbox,

    I am looking into why this might be happening and what to do about it. I will get back to you as soon as I have something definitive.

    Thanks,
    Ted.

    #17902
    tedr
    Member

    Hi Sankar,

    I could not replicate your problem. I have noticed that sometimes VirtualBox can import the VM incorrectly, you could try re-importing the vm, while I continue to try and replicate your issue.

    Thanks,
    Ted.

    #17959
    Sankarg
    Member

    I removed all the lines in the pig code example except the very first line and tried to run the syntax check and i could noticed the message that Detected Tx hang in the virtual box.

    #18104
    Larry Liu
    Moderator

    Hi, Sankar,

    Can you please provide the log from your error message: /home/sandbox/hue/pig_1363556128447.log

    From my initial look, the error said “2013-03-17 14:35:45,555 [main] ERROR org.apache.pig.tools.grunt.Grunt – ERROR 1070: Could not resolve org.apache.hcatalog.pig.HCatLoader using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]”. It is mostly likely the HCatLoader is not in pig classpath.

    Larry

    #18168
    Larry Liu
    Moderator

    Hi, Sankar,

    Here is a workaround you could try to see if it works:

    1. Login to sandbox as root.
    2. Edit file /usr/bin/pig
    3. Replace the last line with the following line:
    exec /usr/lib/pig/bin/pig -useHCatalog “$@”

    I will continue looking into other solution.

    Thanks
    Larry

    #18605
    Rahul Jolly
    Member

    Hi Ted – what would be the fix for this error we’ve been seeing below. I am getting the same.
    Would you please let us know the steps we can take to resolve this.

    Thank you.
    Rahul

    #18625
    Seth Lyubich
    Moderator

    Hi Rahul,

    Can you please clarify which eror you are referring too?

    Thanks,
    Seth

    #18774
    Peter Rudenko
    Moderator

    Hi Sankarg, you could fix it by running next line on VM:
    sed -i '49s/.*/includeHCatalog=true;/' /usr/lib/pig/bin/pig
    or use an instruction, that Larry suggested

    #18800
    Kirill Wood
    Member

    I tried this basic pig example too and got the same error as Sankarg:

    a = LOAD ‘nyse_stocks’ USING org.apache.hcatalog.pig.HCatLoader();
    b = FILTER a BY stock_symbol == ‘IBM';
    c = GROUP b all;
    d = FOREACH c GENERATE avg(b.stock_volume);
    dump d;

    Syntax check gave me this error:
    ERROR 1070: Could not resolve org.apache.hcatalog.pig.HCatLoader using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]

    As Larry suggested, I edited the last line of the file /usr/bin/pig to exec /usr/lib/pig/bin/pig -useHCatalog “$@”.

    The previous error is gone but now it throws another error:
    2013-03-23 20:46:30,499 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to hadoop file system at: file:///
    2013-03-23 20:46:32,452 [main] WARN org.apache.hadoop.hive.conf.HiveConf – DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
    2013-03-23 20:46:32,739 [main] INFO hive.metastore – Trying to connect to metastore with URI thrift://sandbox:9083
    2013-03-23 20:46:33,160 [main] INFO hive.metastore – Waiting 1 seconds before next connection attempt.
    2013-03-23 20:46:34,160 [main] INFO hive.metastore – Connected to metastore.
    2013-03-23 20:46:36,073 [main] WARN org.apache.hadoop.hive.conf.HiveConf – DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
    2013-03-23 20:46:39,105 [main] ERROR org.apache.pig.tools.grunt.Grunt – ERROR 1070: Could not resolve avg using imports: [, org.apache.pig.builtin., org.apache.pig.impl..builtin.]
    Details at logfile: /home/sandbox/hue/pig_1364096789805.log

    log file:
    Failed to parse: Pig script failed to parse:
    Failed to generate logical plan. Nested exception: java.lang.RuntimeException: Cannot instantiate: avg at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184)

    How can I fix this?

    Thank you

    #18836
    Larry Liu
    Moderator

    Hi, Kirill,

    Just wondering if you are using the new version of sandbox? Have you been able to try what peter suggested?

    sed -i ’49s/.*/includeHCatalog=true;/’ /usr/lib/pig/bin/pig

    Larry

    #18928
    Kirill Wood
    Member

    Hi Larry,
    I haven’t tried Peter’s way but I tried your fix.. basically –
    1. Login to sandbox as root.
    2. Edit file /usr/bin/pig
    3. Replace the last line with the following line:
    exec /usr/lib/pig/bin/pig -useHCatalog “$@”

    That fixed the original error on HCatLoader which let me proceed with the script:
    ERROR 1070: Could not resolve org.apache.hcatalog.pig.HCatLoader using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]

    But now, it throws a new error when I use the ‘avg’ function below:
    FOREACH c GENERATE avg(b.stock_volume);

    ERROR 1070: Could not resolve avg using imports: [, org.apache.pig.builtin., org.apache.pig.impl..builtin.]

    I get the same error even in Pig tutorial 2 when I try to use the ‘max’ function in this line:
    FOREACH grp_data GENERATE group as grp, max(runs.runs) as max_runs;

    log file says:
    Failed to parse: Pig script failed to parse:
    Failed to generate logical plan. Nested exception: java.lang.RuntimeException: Cannot instantiate: avg at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184)

    Thanks!

    #18975
    Sasha J
    Moderator

    Kirill,
    it looks like your classpath is broken somehow…
    I just run this tutorial on freshly loaded Sandbox, it works perfectly fine…
    I suggest you reload Sandbox and then try again.

    Thank you!
    Sasha

    #19300

    I’ve downloaded the Sandbox today, and followed the instructions on the left-hand side of the Web browser. I have exactly the same problem as the one documented in this thread.

    2013-03-28 09:30:42,524 [main] INFO org.apache.pig.Main – Apache Pig version 0.10.1.21 (rexported) compiled Jan 10 2013, 04:00:42
    2013-03-28 09:30:42,524 [main] INFO org.apache.pig.Main – Logging error messages to: /home/sandbox/hue/pig_1364488242522.log
    2013-03-28 09:30:42,785 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to hadoop file system at: file:///
    2013-03-28 09:30:43,040 [main] ERROR org.apache.pig.tools.grunt.Grunt – ERROR 1070: Could not resolve org.apache.hcatalog.pig.HCatLoader using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]
    Details at logfile: /home/sandbox/hue/pig_1364488242522.log

    #19414
    tedr
    Member

    Hi Stephen,

    Have you tried any of the fixes mentioned earlier in this thread?

    Thanks,
    Ted.

    #19560
    Kirill Wood
    Member

    So I found out that UDFs are case sensitive, which means the correct syntax would be all caps-
    MAX(runs.runs), not max
    AVG(b.stock_volume), not avg

    #19759
    tedr
    Member

    Hi Kirill,

    Thanks for letting us know.

    Ted.

    #21572

    Wondering what was the resolution for this problem? I just encountered the same problem.

    Thanks

    #21740
    Larry Liu
    Moderator

    Hi, Mehran

    Here is the workaround:

    1. Login to sandbox as root.
    2. Edit file /usr/bin/pig
    3. Replace the last line with the following line:
    exec /usr/lib/pig/bin/pig -useHCatalog “$@”

    Larry

    #21763

    Larry,
    Thanks a lot for your response. I actually did that as also mentioned in another post and also restarted the host but nothing happens when executing and I get the error below when checking the syntax. Any idea what’s going on?
    I appreciate your help,
    Mehran

    2013-04-15 12:12:28,030 [main] INFO org.apache.pig.Main – Apache Pig version 0.10.1.21 (rexported) compiled Jan 10 2013, 04:00:42
    2013-04-15 12:12:28,031 [main] INFO org.apache.pig.Main – Logging error messages to: /home/sandbox/hue/pig_1366053148027.log
    2013-04-15 12:12:32,601 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to hadoop file system at: file:///
    2013-04-15 12:12:35,471 [main] WARN org.apache.hadoop.hive.conf.HiveConf – DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
    2013-04-15 12:12:35,897 [main] INFO hive.metastore – Trying to connect to metastore with URI thrift://sandbox:9083
    2013-04-15 12:12:36,485 [main] INFO hive.metastore – Waiting 1 seconds before next connection attempt.
    2013-04-15 12:12:37,485 [main] INFO hive.metastore – Connected to metastore.
    Unexpected character ‘”‘
    2013-04-15 12:12:41,153 [main] ERROR org.apache.pig.tools.grunt.Grunt – ERROR 1200: Unexpected character ‘”‘
    Details at logfile: /home/sandbox/hue/pig_1366053148027.log

    Here the code I am running:
    a = LOAD ‘nyse_stocks’ USING org.apache.hcatalog.pig.HCatLoader();
    b = FILTER a BY stock_symbol==”IBM”;
    c = group b all;
    d = FOREACH c GENERATE AVG(b.stock_volume);
    dump d;

    #21775

    I had IBM in double quotes instead of single quotes after changing to single quotes I do get a result. Thanks to Ron at Hortonworks who shared his code with me.

    Thanks everyone!

    #21866
    Larry Liu
    Moderator

    Hi, Mehran

    It is great to make it work.

    We are all happy learning 😉

    Larry

    #23576

    Has anyone found the resolution to this issue. It looks like we get it first with Hcat then with pig. it must be a config problem but I can’t even get to the directory recommended above. My /etc/bin/ does not have a pig in it on the newest sandbox – downloaded on 4/26/13. I’m happy to try recommendations if folks can reply. I’ll post the results here.

    #23758
    Yi Zhang
    Moderator

    Hi Adam,

    There is no /etc/bin/pig, do you mean /usr/lib/pig/bin/pig or /usr/bin/pig?

    Try peter’s suggestion:

    sed -i ’49s/.*/includeHCatalog=true;/’ /usr/lib/pig/bin/pig

    This will load HCatalog automatically when pig is started.

    Thanks,
    Yi

    #24582
    Craig Rowley
    Member

    I was receiving the error for newly downloaded HDP Sandbox (downloaded today) when trying to execute tutorial1, check syntax, explain.
    (using Mac OSX 10.6.8, Virtualbox 4.2.4)

    I used terminal and ssh’d into the VM (using IP from url and root/hadoop as auth)

    I applied the fix to /usr/bin/pig (to add -useHCatalog to last line)

    It worked, as far as I can tell

    #24682
    Larry Liu
    Moderator

    Hi, Craig

    Thanks for trying and let us know it worked.

    Please let us know any issues you meet.

    Larry

    #26891
    ETL Doop
    Member

    2013-06-03 16:43:59,751 [main] INFO org.apache.pig.Main – Apache Pig version 0.10.1.21 (rexported) compiled Jan 10 2013, 04:00:42
    2013-06-03 16:43:59,752 [main] INFO org.apache.pig.Main – Logging error messages to: /home/sandbox/hue/pig_1370303039747.log
    2013-06-03 16:43:59,998 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to hadoop file system at: hdfs://sandbox:8020
    2013-06-03 16:44:00,412 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to map-reduce job tracker at: sandbox:50300
    2013-06-03 16:44:00,896 [main] ERROR org.apache.pig.tools.grunt.Grunt – ERROR 1070: Could not resolve org.apache.hcatalog.pig.HCatLoader using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]
    Details at logfile: /home/sandbox/hue/pig_1370303039747.log

    1. Downloaded the Version today..
    2. username / pwd for the Virtual version

    Please assist.

    #26892
    Sasha J
    Moderator

    Dear Doop,
    Do you mean to say “what is username and password for Sandbox”?
    Or you meant something else?
    username is root
    password is hadoop

    Hope this helps!
    Thank you!
    Sasha

    #26894
    ETL Doop
    Member

    Thanks Sarah for login pwd…Logged in as root.

    Unable to access pig folder. How do i resolve HCatloder issue? Please let me know.

    #26941
    ETL Doop
    Member

    The issue resolved. I logged in and replaced the last line in the /usr/bin/pig file:

    Replace to: exec /usr/lib/pig/bin/pig -useHCatalog “$@”

    I followed Kirill Wood steps and it worked…Thank you.

    #26974
    tedr
    Moderator

    Hi Doop,

    Thanks for letting us know that you figured it out.

    Ted.

    #28492

    Hello Ted/Sasha/Larry, I followed the workaround suggestions. Now receiving a different error. Please help.

    2013-06-27 11:12:27,054 [main] INFO org.apache.pig.Main – Apache Pig version 0.11.1.1.3.0.0-107 (rexported) compiled May 20 2013, 03:04:35
    2013-06-27 11:12:27,059 [main] INFO org.apache.pig.Main – Logging error messages to: /hadoop/mapred/taskTracker/hue/jobcache/job_201306271056_0001/attempt_201306271056_0001_m_000000_0/work/pig_1372356747052.log
    2013-06-27 11:12:27,334 [main] INFO org.apache.pig.impl.util.Utils – Default bootup file /usr/lib/hadoop/.pigbootup not found
    2013-06-27 11:12:27,447 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to hadoop file system at: hdfs://sandbox:8020
    2013-06-27 11:12:27,597 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to map-reduce job tracker at: sandbox:50300
    2013-06-27 11:12:28,167 [main] ERROR org.apache.pig.tools.grunt.Grunt – ERROR 1070: Could not resolve org.apache.hcatalog.ping.HCataloader using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]
    Details at logfile: /hadoop/mapred/taskTracker/hue/jobcache/job_201306271056_0001/attempt_201306271056_0001_m_000000_0/work/pig_1372356747052.log

    #28615
    tedr
    Moderator

    Hi Nilesh,

    in looking at this line from your post:

    2013-06-27 11:12:28,167 [main] ERROR org.apache.pig.tools.grunt.Grunt – ERROR 1070: Could not resolve org.apache.hcatalog.ping.HCataloader using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]

    it looks like the problem is a misspelling of the class name – it should be ‘HCatLoader’ not ‘HCataloader’ – note the extra ‘a’ in what you entered and the need for a uppercase ‘L’.

    Thanks,
    Ted.

    #41085

    I think VirtualBox does cause errors on occassion when importing the 2.0 beta sandbox. I had the same error and applied the fixes as described by Larry and Peter, to no effect. After I re-imported the VM the query started working without needing the fixes. Thanks Oracle, not.

    #43682

    Hi,

    I am trying to run the Tutorial #2, I follow each step very carefully but when I Execute I get the following Error.

    Job job_1384288052635_0017 was failed

    Here is the script I try to run:

    batting = load ‘Batting.csv’ using PigStorage(‘,’);
    runs = FOREACH batting GENERATE $0 as playerID, $1 as year, $8 as runs;
    grp_data = GROUP runs by (year);
    max_runs = FOREACH grp_data GENERATE group as grp,MAX(runs.runs) as max_runs;
    join_max_run = JOIN max_runs by ($0, max_runs), runs by (year,runs);
    join_data = FOREACH join_max_run GENERATE $0 as year, $2 as playerID, $1 as runs;
    dump join_data;

    When I click on Explain I get this message:

    #———————————————–
    # New Logical Plan:
    #———————————————–
    Logical plan is empty.

    When I run syntax check, everything seems normal.

    Please help.

    Thanks

    #44734

    I’m also having trouble with the tutorial1 pig script. When I execute it, I get no results back. The job reports success, but the box remains empty. The syntax check passes, while the explain plan fails without presenting any logs. I’ve tried this with Chrome and with Internet Explorer.

    I am running a fresh Hortonworks+Sandbox+2.0+VMware instance with the latest VMware Player. Here is the code I’m running:

    a = LOAD ‘nyse_stocks’ USING org.apache.hcatalog.pig.HCatLoader();
    b = FILTER a BY stock_symbol == ‘IBM';
    c = GROUP b ALL;
    d = FOREACH c GENERATE SUM(b.stock_volume);
    DUMP d;

    Any thoughts? Thanks!

    Kevin

    #48396
    one2go
    Participant

    I encountered this problem (“ERROR 1070: Could not resolve org.apache.hcatalog.ping.HCataloader”) after downloading Hortonworks+Sandbox+2.0+VMware.ova on Feb 8, 2014. The fixes below did not solve the problem when using the browser-based hue editor. (It did solve the problem when running pig from the command line on the sandbox).

    I was able to get it working by patching the following file.

    sed -i 's/^includeHCatalog=.*/includeHCatalog=true;/' /hadoop/yarn/local/filecache/10/pig.tar.gz/pig/bin/pig

    #48729
    Nicolas MASSART
    Participant

    Have the same issue, and i found what’s happen !!!
    The problem come if the parameter “-useHCatalog” is not set.
    If you only write the parameter on the HDP Pig interface it on the text field it is not taken in account, you need to press enter button.

    #49734
    Maddy Techie
    Participant

    hi – i tried the basic tutorial , but its not working. On Execute it succesfully completes , but the output is blank.
    Below is my code and log file

    a = LOAD ‘nyse_stocks’ USING org.apache.hcatalog.pig.HCatLoader();
    b = filter a by stock_symbol == ‘IBM';
    c = group b all;
    d = foreach c generate AVG(b.stock_volume);
    dump d;
    —————
    log file

    2014-03-06 16:56:17,294 [main] INFO org.apache.pig.Main – Apache Pig version 0.12.0.2.0.6.0-76 (rexported) compiled Oct 17 2013, 20:44:07
    2014-03-06 16:56:17,322 [main] INFO org.apache.pig.Main – Logging error messages to: /hadoop/yarn/local/usercache/hue/appcache/application_1394153146773_0001/container_1394153146773_0001_01_000002/pig_1394153777285.log
    2014-03-06 16:56:35,564 [main] INFO org.apache.pig.impl.util.Utils – Default bootup file /home/yarn/.pigbootup not found
    2014-03-06 16:56:37,061 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2014-03-06 16:56:37,062 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
    2014-03-06 16:56:37,063 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to hadoop file system at: hdfs://sandbox.hortonworks.com:8020
    2014-03-06 16:56:37,113 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
    2014-03-06 16:56:44,048 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – dfs.df.interval is deprecated. Instead, use fs.df.interval
    2014-03-06 16:56:44,049 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – mapred.task.tracker.http.address is deprecated. Instead, use mapreduce.tasktracker.http.address
    2014-03-06 16:56:44,049 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – dfs.max.objects is deprecated. Instead, use dfs.namenode.max.objects
    2014-03-06 16:56:44,050 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – mapred.userlog.retain.hours is deprecated. Instead, use mapreduce.job.userlog.retain.hours
    2014-03-06 16:56:44,051 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – hadoop.native.lib is deprecated. Instead, use io.native.lib.available
    2014-03-06 16:56:44,053 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – mapred.local.dir.minspacestart is deprecated. Instead, use mapreduce.tasktracker.local.dir.minspacestart
    2014-03-06 16:56:44,053 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – mapred.shuffle.read.timeout is deprecated. Instead, use mapreduce.reduce.shuffle.read.timeout
    2014-03-06 16:56:44,055 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – io.sort.spill.percent is deprecated. Instead, use mapreduce.map.sort.spill.percent
    2014-03-06 16:56:44,056 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – mapred.reduce.parallel.copi

    #50613
    suaroman
    Participant

    Just checking to see if there are any updates?
    I have the latest 2.x sandbox and experiencing the same problem.

    None of the recommended solutions has worked.

    This command: (from Larry )
    sed -i ’49s/.*/includeHCatalog=true;/’ /usr/lib/pig/bin/pig

    didn’t resolve the problem

    nor did manually going to pig file and making the adjustments work.

    I don’t think this problem is related to the VM. I’m using VMWare workstation 10.

    If you need anything, i’ll be happy to send you my files or any of the configurations I have.
    thanks,

    #51537
    Don Estes
    Participant

    Thanks Nicolas! That worked for me when all the other work arounds didn’t.

    #53871
    Ahmed Hashmi
    Participant

    hi ,

    I have Just installed the Sandbox and ran the Basic PIG Program . It is running successfully but with no output and logs also not showing any Error .
    Please help to resolve this .

    ls: cannot access /usr/lib/hive/lib/slf4j-api-*.jar: No such file or directory
    2014-05-16 23:14:27,247 [main] INFO org.apache.pig.Main – Apache Pig version 0.12.1.2.1.1.0-385 (rexported) compiled Apr 16 2014, 15:59:00
    2014-05-16 23:14:27,249 [main] INFO org.apache.pig.Main – Logging error messages to: /hadoop/yarn/local/usercache/hue/appcache/application_1400296543712_0013/container_1400296543712_0013_01_000002/pig_1400307267243.log
    2014-05-16 23:14:28,706 [main] INFO org.apache.pig.impl.util.Utils – Default bootup file /home/yarn/.pigbootup not found
    2014-05-16 23:14:29,677 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2014-05-16 23:14:29,678 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
    2014-05-16 23:14:29,678 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to hadoop file system at: hdfs://sandbox.hortonworks.com:8020
    2014-05-16 23:14:31,016 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
    2014-05-16 23:14:31,094 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
    2014-05-16 23:14:31,163 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
    2014-05-16 23:14:31,227 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
    2014-05-16 23:14:31,293 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
    2014-05-16 23:14:31,300 [main] WARN org.apache.pig.PigServer – Empty string specified for jar path
    2014-05-16 23:14:31,391 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
    2014-05-16 23:14:31,558 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
    2014-05-16 23:14:31,760 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
    2014-05-16 23:14:31,838 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
    2014-05-16 23:14:31,969 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
    2014-05-16 23:14:34,773 [main] INFO hive.metastore – Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083
    2014-05-16 23:14:35,133 [main] INFO hive.metastore – Connected to metastore.
    2014-0

    #55563
    Anand Venkatraman
    Participant

    I was able to successfully complete tutorial 1 (nyse_stock). However, I keep getting a bunch of errors with tutorial 2 (batting.csv)

    ls: cannot access /usr/lib/hive/lib/slf4j-api-*.jar: No such file or directory
    2014-06-10 20:31:04,641 [main] INFO org.apache.pig.Main – Apache Pig version 0.12.1.2.1.1.0-385 (rexported) compiled Apr 16 2014, 15:59:00
    2014-06-10 20:31:04,642 [main] INFO org.apache.pig.Main – Logging error messages to: /hadoop/yarn/local/usercache/hue/appcache/application_1402456675484_0003/container_1402456675484_0003_01_000002/pig_1402457464639.log
    2014-06-10 20:31:05,503 [main] INFO org.apache.pig.impl.util.Utils – Default bootup file /home/yarn/.pigbootup not found
    2014-06-10 20:31:05,656 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2014-06-10 20:31:05,656 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
    2014-06-10 20:31:05,656 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to hadoop file system at: hdfs://sandbox.hortonworks.com:8020
    2014-06-10 20:31:06,288 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
    2014-06-10 20:31:06,324 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
    2014-06-10 20:31:06,356 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
    2014-06-10 20:31:06,389 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
    2014-06-10 20:31:06,416 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
    2014-06-10 20:31:06,419 [main] WARN org.apache.pig.PigServer – Empty string specified for jar path
    2014-06-10 20:31:06,490 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
    2014-06-10 20:31:06,539 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
    2014-06-10 20:31:06,592 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
    2014-06-10 20:31:06,641 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
    <file script.pig, line 1, column 15> Unexpected character ‘‘’
    2014-06-10 20:31:06,828 [main] ERROR org.apache.pig.PigServer – exception during parsing: Error during parsing. <file script.pig, line 1, column 15> Unexpected character ‘‘’
    Failed to parse: <file script.pig, line 1, column 15> Unexpected character ‘‘’
    at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:243)
    at org.apache.pi

    #55728
    VJ Jain
    Participant

    I have the same problem with SandBox 2.1 for all Pig tutorials that use PigStorage or reference HCatalog. I have tried numerous fixes on here and looked for more recent Pig scripts to see what differences there are. I noticed that some of the newer scripts use the full virtual path for the load. For example, STOCK_A = LOAD ‘/user/hue/NYSE_daily_prices_A.csv’ using PigStorage(‘,’); instead of STOCK_A = LOAD ‘NYSE_daily_prices_A.csv’ using PigStorage(‘,’). This did not resolve my problem. Furthermore, while the job errors, it produces no log files. I have tried to locate them on the server but they don’t exist. There is obviously some issue with the 2.1 SandBox making the earlier Pig tutorials invalid because none of them work. I read the more recent posts on this thread and I am in the exact same position. The earlier solutions might have worked for earlier distributions but not for 2.1. Can someone please let us all know how to get the Pig scripts working on 2.1?

    #56105
    Owen Taylor
    Participant

    I think I have a fix:
    HCAT_HOME needs to be set properly to
    /usr/lib/hive-hcatalog

    One way to do this is to Edit the following file:

    /usr/lib/pig/bin/pig

    So that this line
    if [ “$HCAT_HOME” == “” ]; then

    is preceded by this line:

    HCAT_HOME=/usr/lib/hive-hcatalog

    So I have the following:

    #### Owen otaylor@hortonworks.com Added this next line to enable finding HCATALOG Jars etc:

    HCAT_HOME=/usr/lib/hive-hcatalog

    if [ “$HCAT_HOME” == “” ]; then
    if [ -d “/usr/lib/hcatalog” ]; then
    HCAT_HOME=/usr/lib/hcatalog
    else
    echo “Please initialize HCAT_HOME”
    exit -1
    fi
    fi
    hcatJarPath=ls $HCAT_HOME/share/hcatalog/$hcatJar

    etc…

    After doing this the following works for me:

    a = LOAD ‘nyse_stocks’ USING org.apache.hcatalog.pig.HCatLoader();
    b = FILTER a BY stock_symbol == ‘IBM';
    c = GROUP b ALL;
    d = FOREACH c GENERATE AVG(b.stock_volume);
    store d INTO ‘/wip/pigtest/1′;

    Cheers,

    Owen.

    #56154
    Amit Likhyani
    Participant

    Owen, I downloaded the latest sandbox 2 days ago and implemented the fix. Same error. Anyone else having the same issues?

    #56174
    Owen Taylor
    Participant

    It should be noted that the exceptions and issues in this thread are not all the same. I mention this because I don’t want folks to be confused into thinking this is harder than it should be or more broken than it is. :)
    Bottom, line – carefully read the stack traces for clues regarding which error you are facing.
    [Sometimes there is no error at all – just warnings which can look alarming but can be overlooked – the data may be written to the output directory as you requested and you just need to go find it! ]

    Types of errors exposed in this thread:

    * Security exceptions due to copy-paste errors where quotes are not ” ” regular quotes Solution: check your scripts for weird symbols/characters and hand-type the correct ones in if they look suspicious

    * Missing library issues (this was the one I addressed in my earlier post by adding HCAT_HOME=/usr/lib/hive-hcatalog into the pig script)

    * Jar file typos Hcatatalog or similar typos prevent the JVM from loading the correct jar – Solution: double check your keywords and jar file names etc

    * Formatting issues within the pig script – using ” in place of ‘ or using = instead of == Solution: read the exception text and carefully examine all your code

    I think that pretty much covers it. Hopefully, that adds some clarity going forward.

    Cheers,

    Owen.

    #57068
    Charles Ibrahim
    Participant

    Hi,
    I got the same error, still.

    When typing:

    a = LOAD ‘default.sample_07′ USING org.apache.hcatalog.pig.HCatLoader();

    I got:

    ls: cannot access /usr/lib/hive/lib/slf4j-api-*.jar: No such file or directory
    2014-07-10 01:55:56,678 [main] INFO org.apache.pig.Main – Apache Pig version 0.12.1.2.1.1.0-385 (rexported) compiled Apr 16 2014, 15:59:00
    2014-07-10 01:55:56,680 [main] INFO org.apache.pig.Main – Logging error messages to: /hadoop/yarn/local/usercache/hue/appcache/application_1404978936821_0004/container_1404978936821_0004_01_000002/pig_1404982556676.log
    2014-07-10 01:55:58,231 [main] ERROR org.apache.pig.Main – ERROR 2997: Encountered IOException. File HCatalog->load template does not exist
    Details at logfile: /hadoop/yarn/local/usercache/hue/appcache/application_1404978936821_0004/container_1404978936821_0004_01_000002/pig_1404982556676.log

    #57267
    Howard Dierking
    Participant

    I noticed that there hasn’t been any update on this thread for quite a while and I just hit the same issue today. Was there a work around discovered other than modifying /usr/bin/pig to add the -useHCatalog flag to the pig command?

    #57391
    Tom Benton
    Moderator

    Howard, are you still having issues? I just went through the tutorial and was able to complete it successfully. Do you have the latest version of the sandbox and tutorials installed?

    The -useHCatalog option can be added using HUE and the Pig prompt.

    #57718
    Sagar Allamdas
    Participant

    Just go to Sandbox shell….
    write…
    pig -useHcatalog
    —insert ur pig script here—-

    it works fast and great

    #57719
    Howard Dierking
    Participant

    Yes, I added the -useHcatalog statement and it was still not working. Ended up getting unblocked by just running pig in local mode on my machine, so will circle back to the sandbox at some point in the future.

    #58724
    Dan Macklin
    Participant

    Hi,

    If anyone is still getting no output when they run this job, try clicking on the cross next to the check option near the pig arguments under the pig script window.

    This option was set on my sandbox, once I removed it everything worked great.

    Thanks

    Dan

    #59834

    Hi.
    As Nicolas MASSART said, after inserting the parameter -useHCatalog, you need to press enter button. That worked for me.

    #59847
    Manish Dubey
    Participant

    Hello All,

    I am facing similar issue while executing pig script first example mentioned below. I am new to this .Could anyone help me on this what basically needs to be done to resolve this issue.

    Earlier I was getting error ‘ls: cannot access /usr/lib/hive/lib/slf4j-api-*.jar: No such file or directory’ . I copied file in same directory but still having same error 1070.

    a = LOAD ‘nyse_stock’ USING org.apache.hcatalog.pig.HCatLoader();
    b = FILTER a BY stock_symbol == ‘ASP';
    c = GROUP b all;
    d = FOREACH c GENERATE AVG(b.stock_volume);
    dump d;

    Caused by:
    <file script.pig, line 1, column 28> pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.hcatalog.pig.HCatLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
    at org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:1299)
    at org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:1284)
    at org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:5158)
    at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3515)
    at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1625)
    at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102)
    at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560)
    at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
    at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188)
    … 18 more
    Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.hcatalog.pig.HCatLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
    at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:653)
    at org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:1296)
    … 26 more
    2014-09-04 04:36:48,973 [main] ERROR org.apache.pig.tools.grunt.Grunt – ERROR 1070: Could not resolve org.apache.hcatalog.pig.HCatLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]

    Thanks in Anticipation!!!!!
    Manish

    #64662
    David Fonseca
    Participant

    Hello, I’m new in this helldoop error 1070 using pig.
    1. I’ve downloaded the HDP 2.2
    2. I’ve modified the file /usr/bin/pig as Larry says (no luck)
    3. I’ve tried to run the sed command sed -i ’49s/.*/includeHCatalog=true;/’ /usr/lib/pig/bin/pig (Error: can’t read /usr/lib/pig/bin/pig: No such file or directory)
    4. I added to the pig arguments section the sentence “-useHCatalog” (same horrible 1070 error)
    5. I’ve googled with no luck.

    I know that this should be super easy but It is not working.

    Does anybody know how to solve this error? Your help will be much appreciated!!!

    #64664
    David Fonseca
    Participant

    Hello, Finally I found all the issues.
    In the following link I posted the steps to solve all of them http://idavit.blogspot.mx/2014/12/como-no-morir-en-el-intento-primer.html

    thanks to Arthur, the post he made were a light in the dark!!!

    #65223
    mindtd
    Participant

    Had ERROR 1070, followed D. Fonseca’s steps.

    Got new ERROR
    New Error 2997
    15/01/12 01:02:56 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
    15/01/12 01:02:56 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
    15/01/12 01:02:56 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
    2015-01-12 01:02:56,284 [main] INFO org.apache.pig.Main – Apache Pig version 0.14.0.2.2.0.0-2041 (rexported) compiled Nov 19 2014, 15:24:46
    2015-01-12 01:02:56,287 [main] INFO org.apache.pig.Main – Logging error messages to: /hadoop/yarn/local/usercache/hue/appcache/application_1421017583604_0004/container_1421017583604_0004_01_000002/pig_1421024576282.log
    2015-01-12 01:03:05,270 [main] ERROR org.apache.pig.Main – ERROR 2997: Encountered IOException. File useHCatalog does not exist
    Details at logfile: /hadoop/yarn/local/usercache/hue/appcache/application_1421017583604_0004/container_1421017583604_0004_01_000002/pig_1421024576282.log
    2015-01-12 01:03:06,119 [main] INFO org.apache.pig.Main – Pig script completed in 10 seconds and 164 milliseconds (10164 ms)

    Tried to see if it works:
    1. Login to sandbox as root
    2. Edit /usr/bin/pig
    3. Replace the last line with the following line:
    exec /usr/lib/pig/bin/pig -useHCatalog “$@”

    That did not work; I got empty Log (nothing) when run failed. Any workaround?

    #65228
    mindtd
    Participant

    Additonal info – I went through the same Pig exercise running HDP 1.3, HDP 2.1 and HDP 2.2. Completed tutorial in 1.3 and 2.1.

    HDP 2.2 is one I encountered ERROR 2997.

    #65256
    Gopi Dhaks
    Participant

    Same issue any work around ?
    Appreciate your help

    #65329
    Andrew Taylor
    Participant

    Try changing the first line to
    a = LOAD ‘nyse_stocks’ USING org.apache.hive.hcatalog.pig.HCatLoader();

    (add “.hive” after “.apache”.)

    #65873
    mousaid
    Participant

    Add hive after org.apache:
    a = LOAD ‘nyse_stocks’ USING org.apache.hive.hcatalog.pig.HCatLoader();
    b = FILTER a by stock_symbol == ‘IBM';
    c = group b all;
    d = foreach c generate AVG(b.stock_volume);
    dump d;

    Save–>execute and if it does not work follow, the steps in this blog post: http://idavit.blogspot.mx/2014/12/como-no-morir-en-el-intento-primer.html

    #71793
    Rajeev Trikha
    Participant

    It certainly isn’t. I had to change the first line to the following to make it work:

    a = LOAD ‘nyse_stocks’ USING org.apache.hive.hcatalog.pig.HCatLoader();

    The job succeeds but the log has many warnings about deprecated methods. Results are fine. As a novice I would expect the sandbox to behave straight out of the box.

    #73447
    N U M
    Participant

    Hi,

    I am novice to Hadoop environment. I installed Hadoop 2.2.4.2 Virtual Box version Sandbox on Windows7.

    I imported the NYSE-2000-2001.tsv file to Hue File browser successfully.
    I created a table nyse_stocks using the above file in Hue HCatalog
    I am able to run sample Hive query in Beeswax (Hive UI) interface.

    BUT, I am not able to run the sample Pig script and it’s throwing the below error.

    ls: cannot access /hadoop/yarn/local/usercache/hue/appcache/application_1418520658982_0007/container_1418520658982_0007_01_000002/hive.tar.gz/hive/lib/slf4j-api-*.jar: No such file or directory
    ls: cannot access /hadoop/yarn/local/usercache/hue/appcache/application_1418520658982_0007/container_1418520658982_0007_01_000002/hive.tar.gz/hive/lib/commons-lang3-*.jar: No such file or directory
    ls: cannot access /hadoop/yarn/local/usercache/hue/appcache/application_1418520658982_0007/container_1418520658982_0007_01_000002/hive.tar.gz/hive/hcatalog/lib/*hbase-storage-handler-*.jar: No such file or directory
    etc…
    ….

    I do have the word Hive after org.apache in the 1st line of sample pig script.
    When I tried to fix it using the steps mentioned in the URL http://idavit.blogspot.mx/2014/12/como-no-morir-en-el-intento-primer.html, I don’t have such file /apps/webhcat/hive.tar.gz. Is it version specific? Am I missing something in installation/configuration?

    It should work straight forward with Sandbox’s sample script, but for some reason it’s not working. Any help is appreciated. Thanks.

    Regards,
    Ravi

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.