Hortonworks Sandbox Forum

Map reduce

  • #30974
    Participant

    Hi,

    Could anyone please tell me how I can run word count or similar type of example for map reduce using hortonworks sandbox? It would be great to have the exact steps to do that both with IDE and console. I appreciate your help.
    Thanks
    Roy

to create new topics or reply. | New User Registration

  • Author
    Replies
  • #31002
    Cheryle Custer
    Moderator

    Hi,

    Take a look at this blog: http://www.jumpingthroughthehadoop.com/2013/07/29/how-to-word-count-with-pig-and-the-hortonworks-sandbox/

    Looks like another user has figured this out and created an entry for it.

    Cheryle

    #31015
    Participant

    Hi Cheryle,

    Thanks a lot. I can run the program with pig and hive now. I ran the following script in PIG as given in tutorial 1 but not getting the results in the green box. Also, the progress bar looks red (not green as shown in tutorial) after the completion of the query. I can see the log only. Any help?

    a = LOAD nyse-stocks USING org.apache.hcatalog.pig.HCatLoader();
    b=FILTER a BY stock_symbol ==’IBM';
    c=GROUP b all;
    d = foreach c generate AVG(b.stock_volume);
    dump d;

    #31130
    Sasha J
    Moderator

    Did you load file nyce-stocks to HDFS?
    FIle name in LOAD statement should be enclosed in ” (single quotes).

    What log says?

    Thank you!
    Sahsa

    #31132
    Participant

    Hi Sahsa:

    Thanks a lot for the reply.

    Yes, I did. But still the progress bar looks completed but in red. Nothing showed up in the green box. The result file (.txt) is also empty. I can access the logs (given below). Same thing happened with tutorial 3 and 4 with pig and hive. is there anything wrong with the installation? I am can see the console. Everything looks ok as shown in the sandbox installation guide.

    Please give me a hint.

    Thanks a lot.
    Roy
    Log files given:

    013-08-08 12:22:24,826 [main] INFO org.apache.pig.Main – Apache Pig version 0.11.1.1.3.0.0-107 (rexported) compiled May 20 2013, 03:04:35
    2013-08-08 12:22:24,827 [main] INFO org.apache.pig.Main – Logging error messages to: /hadoop/mapred/taskTracker/hue/jobcache/job_201308070729_0012/attempt_201308070729_0012_m_000000_0/work/pig_1375989744822.log
    2013-08-08 12:22:25,707 [main] INFO org.apache.pig.impl.util.Utils – Default bootup file /usr/lib/hadoop/.pigbootup not found
    2013-08-08 12:22:26,282 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to hadoop file system at: hdfs://sandbox:8020
    2013-08-08 12:22:38,175 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to map-reduce job tracker at: sandbox:50300
    2013-08-08 12:22:41,024 [main] WARN org.apache.hadoop.hive.conf.HiveConf – DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
    2013-08-08 12:22:41,483 [main] INFO hive.metastore – Trying to connect to metastore with URI thrift://sandbox:9083
    2013-08-08 12:22:41,859 [main] INFO hive.metastore – Waiting 1 seconds before next connection attempt.
    2013-08-08 12:22:42,860 [main] INFO hive.metastore – Connected to metastore.
    2013-08-08 12:22:43,420 [main] ERROR org.apache.pig.tools.grunt.Grunt – ERROR 1115: Table not found : default.nyse-stocks table not found
    Details at logfile: /hadoop/mapred/taskTracker/hue/jobcache/job_201308070729_0012/attempt_201308070729_0012_m_000000_0/work/pig_1375989744822.log

    #31153
    Sasha J
    Moderator

    Did you create a table from the loaded file?

    the error is:

    2013-08-08 12:22:43,420 [main] ERROR org.apache.pig.tools.grunt.Grunt – ERROR 1115: Table not found : default.nyse-stocks table not found

    In order to load data by Pig, tables definition should be created in HCatalog first…

    Sasha

    #49766
    camoor
    Participant

    I had the same problem. You’re looking for nyse-stocks (with a dash) You need to look for nyse_stocks with an underscore. Hope that helps!

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.