Hortonworks Sandbox Forum

Map reduce

  • #30974


    Could anyone please tell me how I can run word count or similar type of example for map reduce using hortonworks sandbox? It would be great to have the exact steps to do that both with IDE and console. I appreciate your help.

to create new topics or reply. | New User Registration

  • Author
  • #31002
    Cheryle Custer


    Take a look at this blog: http://www.jumpingthroughthehadoop.com/2013/07/29/how-to-word-count-with-pig-and-the-hortonworks-sandbox/

    Looks like another user has figured this out and created an entry for it.



    Hi Cheryle,

    Thanks a lot. I can run the program with pig and hive now. I ran the following script in PIG as given in tutorial 1 but not getting the results in the green box. Also, the progress bar looks red (not green as shown in tutorial) after the completion of the query. I can see the log only. Any help?

    a = LOAD nyse-stocks USING org.apache.hcatalog.pig.HCatLoader();
    b=FILTER a BY stock_symbol ==’IBM’;
    c=GROUP b all;
    d = foreach c generate AVG(b.stock_volume);
    dump d;

    Sasha J

    Did you load file nyce-stocks to HDFS?
    FIle name in LOAD statement should be enclosed in ” (single quotes).

    What log says?

    Thank you!


    Hi Sahsa:

    Thanks a lot for the reply.

    Yes, I did. But still the progress bar looks completed but in red. Nothing showed up in the green box. The result file (.txt) is also empty. I can access the logs (given below). Same thing happened with tutorial 3 and 4 with pig and hive. is there anything wrong with the installation? I am can see the console. Everything looks ok as shown in the sandbox installation guide.

    Please give me a hint.

    Thanks a lot.
    Log files given:

    013-08-08 12:22:24,826 [main] INFO org.apache.pig.Main – Apache Pig version (rexported) compiled May 20 2013, 03:04:35
    2013-08-08 12:22:24,827 [main] INFO org.apache.pig.Main – Logging error messages to: /hadoop/mapred/taskTracker/hue/jobcache/job_201308070729_0012/attempt_201308070729_0012_m_000000_0/work/pig_1375989744822.log
    2013-08-08 12:22:25,707 [main] INFO org.apache.pig.impl.util.Utils – Default bootup file /usr/lib/hadoop/.pigbootup not found
    2013-08-08 12:22:26,282 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to hadoop file system at: hdfs://sandbox:8020
    2013-08-08 12:22:38,175 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to map-reduce job tracker at: sandbox:50300
    2013-08-08 12:22:41,024 [main] WARN org.apache.hadoop.hive.conf.HiveConf – DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
    2013-08-08 12:22:41,483 [main] INFO hive.metastore – Trying to connect to metastore with URI thrift://sandbox:9083
    2013-08-08 12:22:41,859 [main] INFO hive.metastore – Waiting 1 seconds before next connection attempt.
    2013-08-08 12:22:42,860 [main] INFO hive.metastore – Connected to metastore.
    2013-08-08 12:22:43,420 [main] ERROR org.apache.pig.tools.grunt.Grunt – ERROR 1115: Table not found : default.nyse-stocks table not found
    Details at logfile: /hadoop/mapred/taskTracker/hue/jobcache/job_201308070729_0012/attempt_201308070729_0012_m_000000_0/work/pig_1375989744822.log

    Sasha J

    Did you create a table from the loaded file?

    the error is:

    2013-08-08 12:22:43,420 [main] ERROR org.apache.pig.tools.grunt.Grunt – ERROR 1115: Table not found : default.nyse-stocks table not found

    In order to load data by Pig, tables definition should be created in HCatalog first…



    I had the same problem. You’re looking for nyse-stocks (with a dash) You need to look for nyse_stocks with an underscore. Hope that helps!

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.