Hdfs command goes to cluster, Pig command goes to local filesystem ??


This topic contains 1 reply, has 1 voice, and was last updated by  Niels Basjes 8 months, 3 weeks ago.

  • Creator
  • #53725

    Niels Basjes


    I have just installed HDP 2.1.2 on my systems and setup the environment almost the same as what I had with CDH 5.0.0.
    I.e. environment setting like $HADOOP_CONF_DIR and $HADOOP_MAPRED_HOME have been setup right (as far as I can tell)
    So if I do:
    hdfs dfs -ls -R /
    I see the files that are on my HDFS.

    I do this:
    $ pig
    2014-05-15 10:41:55,810 [main] INFO org.apache.pig.Main – Apache Pig version (rexported) compiled Apr 27 2014, 16:49:54

    2014-05-15 10:41:56,284 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to hadoop file system at: file:///

    When I do this

    $ pig -printCmdDebug
    Find hadoop at /usr/bin/hadoop
    dry run:
    HADOOP_CLASSPATH: /home/nbasjes/hadoop-environment-configs/kluster-hdp21:/usr/lib/pig/bin/../conf:/usr/java/default/lib/tools.jar:/etc/hadoop/conf:/usr/lib/pig/bin/../lib/avro-1.7.4.jar:/usr/lib/pig/bin/../lib/avro-ipc-1.7.4-tests.jar:/usr/lib/pig/bin/../lib/avro-mapred-1.7.4.jar:/usr/lib/pig/bin/../lib/jruby-complete-1.6.7.jar:/usr/lib/pig/bin/../lib/json-simple-1.1.jar:/usr/lib/pig/bin/../lib/jython-standalone-2.5.3.jar:/usr/lib/pig/bin/../lib/piggybank.jar:/usr/lib/pig/bin/../pig-
    HADOOP_CLIENT_OPTS: -Xmx1000m -Dpig.log.dir=/usr/lib/pig/bin/../logs -Dpig.log.file=pig.log -Dpig.home.dir=/usr/lib/pig/bin/..
    /usr/bin/hadoop jar /usr/lib/pig/bin/../pig-

    Ok, so I manually set these two and run the command at the bottom.
    $ /usr/bin/hadoop jar /usr/lib/pig/bin/../pig-

    2014-05-15 10:36:00,329 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to hadoop file system at: hdfs://node1.kluster.basjes.lan/

    What am I doing wrong???
    Why doesn’t pig go to my HDFS when running the normal script??

Viewing 1 replies (of 1 total)

You must be to reply to this topic. | Create Account

  • Author
  • #53729

    Niels Basjes

    The best guess I have right now is that BOTH my HADOOP_CONF_DIR and /etc/hadoop/conf are in the classpath.
    After I removed all files from /etc/hadoop/conf is suddenly worked.

Viewing 1 replies (of 1 total)
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.