Home Forums HDP on Windows – Non Installation issues Getting Hue running with a Windows cluster

This topic contains 6 replies, has 2 voices, and was last updated by  Dave 5 months, 3 weeks ago.

  • Creator
    Topic
  • #50487

    Steve D
    Participant

    I’m trying to install Hue on a Linux server, but interacting with a HDP Windows cluster
    The filebrowser, HCatalog and Beeswax seem to be running. But I can get Pig running

    1. This is for Hue 2.3 which comes with Hortonwworks HDP2
    2. On the cluster nodes, configure webhdfs, webhcat and Oozie as
    a. Additionally, in oozie-site.xml on the cluster nodes, change the oozie.service.WorkflowAppService.system.libpath
    value. Set it to /user/oozie/share/lib rather than /user/${user.name}/share/lib since on windows the services run as the “hadoop” user (otherwise hue complains that the Oozie Share Lib is not in the default location )
    b. On one of the cluster nodes, install the Oozie Shared libs into the hdfs filesystem
    Command line is:
    C:\hdp\oozie-4.0.0.2.0.6.0-0009\oozie-win-distro\share\lib>hdfs dfs -put * /user/oozie/share/lib/

    3. On the Hue server, install Hue, Pig, Hcatalog, Hive and Hbase. (pig needs hcatalog and hive, hue need pig and hbase)
    a. Configure Hue webserver as per the Hortonworks docs (skipped SSL in our case)
    b. Configure Hadoop/Yarn/Beeswax as per Hortonworks docs
    Note: The Hue server will also be the Beeswax server, so should bind to it’s FQDN dns name
    d. Configure pig as per the hortonworks docs. Make sure JAVA_HOME is set in /etc/pig/conf/pig-env.sh
    In environment variables, set HCAT_HOME to the same value as HCATALOG_HOME.
    e. Also, copy the hive-site.xml from one cluster node to /etc/hive/conf/ on the Hue server
    Edit hive_conf_dir in hue.ini to point to /etc/hive/conf/ so it can find hive-site.xml
    f. Configure Oooze, UserAdmin and WebHcat for Hue as per Hortonworks docs)
    g. Create a /etc/hadoop/conf/core-site.xml on the hue server
    Set the hadoop.tmp.dir (should be a directory on the hue server) and fs.defaultFS (should be set to hdfs://<namenode>:8020)
    h. Create /etc/hadoop/conf/hdfs-site.xml on the hue server
    Set the dfs.namenode.http-address to point to the http address on name node. (<namenode>:50070)
    i. Edit /etc/hadoop/conf/hadoop-env.sh on the Hue server
    Set the JAVA_HOME to point to your jvm installation (e.g. /usr/lib/jvm/java-1.7.0-openjdk.x86_64)
    j. Patch /usr/lib/hue/desktop/libs/hadoop/src/hadoop/fs/webhdfs.py to set the superuser to ‘hadoop’, as per https://groups.google.com/a/cloudera.org/forum/#!topic/hue-user/IR76qQnYQB4
    (make sure the cluster is started on windows using the hadoop user!)
    4. Make sure the hdfs permissions are set to allow hadoop user to be the owner at root level
    5. Start Hue
    6. When you first log into hue, call the admin user ‘hadoop’, so they have write access to the hive metastore (since the ‘hadoop’ user in the windows cluster has write access to /hive/warehouse in hdfs).
    7. At this point, Hive + HCatalog are working.
    8. Pig on the linux box runs from a script or from grunt console

    Pig jobs started from Hue simply output the “pig –help” output. And the JobBrowser can’t find the job. <urlopen error [Errno -2] Name or service not known>

Viewing 6 replies - 1 through 6 (of 6 total)

The topic ‘Getting Hue running with a Windows cluster’ is closed to new replies.

  • Author
    Replies
  • #50679

    Dave
    Moderator

    Hi Steve,

    Yes that’s correct.
    Basically, if you have any issues with Hue (as it hasn’t been tested on Windows) we would not be able to raise bugs / create fixes.
    That’s not to say it won’t work, but the cluster would fall under support – ie as a Windows cluster if you are a paying support customer.

    Thanks

    Dave

    Collapse
    #50660

    Steve D
    Participant

    Dave,

    In the other thread you said a mixed Win/Linux cluster wouldn’t be a supported case.

    in this instance the hue server is not an active part of the cluster, would that be supported?

    any thoughts on why the Pig cmd line args are getting messed up?
    that feels like a bug somewhere.

    steve

    Collapse
    #50651

    Dave
    Moderator

    Hi Steve,

    I’m glad you got this working, it has been in the pipeline for me to test out, so I’m keen to give it a go.

    Thanks

    Dave

    Collapse
    #50611

    Steve D
    Participant

    Success !

    In the map step syslog I could see:
    2014-03-25 22:14:07,331 INFO [main] org.apache.hive.hcatalog.templeton.tool.TrivialExecService: Starting cmd: [cmd, /c, call, C:\hdp\\pig-0.12.0.2.0.6.0-0009/bin/pig.cmd, -D"mapreduce.job.credentials.binary=/c:/hadoop/data/hadoop/local/usercache/hadoop/appcache/application_1395717154689_0002/container_1395717154689_0002_01_000002/container_tokens"="-useHCatalog", -file, script.pig]

    The = just before the -useHCatalog looked suspicious.
    If I manually tried to run that command line on the datanode (replacing the , between args with spaces) then it wouldn’t work
    But it works if I removed the =”-useHCatalog” portion

    So, the final workaround is this:
    On the Windows nodes, edit C:\hdp\pig-0.12.0.2.0.6.0-0009\bin\pig.cmd so that pig always uses HCatalog.
    Do this by copying the set HCAT_FLAG=”true” to immediately after the line set PIGARGS=

    Then, when running pig scripts from hue, remove the -useHCatalog from the Pig argument when submitting the script (just below the script window)

    Pig will work in Hue now, but I guess that trying to submit Pig job with other arguments, may also fail.
    So it appears that the problem may lie in how templeton is concatenating pig arguments together? Along with the credentials argument.

    Comments from Hortonworks on wether this is a correct fix (or other ramifications) would be appreciated.
    And wether future HDP on windows releases will address this

    Collapse
    #50556

    Steve D
    Participant

    I some progress.
    I can see to Pig job being queued and executed as a YARN application, but it still fails with an exit code 7
    Within the log I can see the call to pig.cmd, but not 100% sure what might be wrong with it.

    The log of the Map phase is here

    Collapse
    #50493

    Steve D
    Participant

    Pig works when run on one of the datanodes (or master node)

    In the Hue server logs I can see the pig job being invoked
    [24/Mar/2014 21:11:19 +0000] views DEBUG User hadoop started pig job via webhcat: curl -s -d file=/tmp/.pigjobs/hadoop/tut1_sd_1395655879/script.pig -d statusdir=/tmp/.pigjobs/hadoop/tut1_sd_1395655879 -d callback=http://auhadooptest03.clients.global.arup.com:8000/pig/notify/$jobId/ -d arg=-useHCatalog

    And then if I look in the statusdir the ‘exit’ file just has ’7′ in it, and the stderr is empty and the stdout has the “pig –help” output as described above.

    So how can I debug why invoking Pig from Hue isn’t working, when it works from within the Windows cluster, and also when called from the console on the Hue server ?

    Collapse
Viewing 6 replies - 1 through 6 (of 6 total)