Go Hadoop! Err, Hadoop and Go.

Personally, I’ve followed the Go Programming Language (golang) with increasing interest for a while and have been itching to really sink my teeth into it. I’ve always felt you never learn any programming language for real unless it’s used to build a fairly large, real-world solution. It’s the only way to gain tackle real issues and gain some confidence for future battles with destiny… FTR, my first real project in Java was Hadoop, circa 2006. *smile*

So, I figured, what the hell, let’s go for it with Apache Hadoop and YARN! For those of you not familiar with YARN, it is the basis for application architecture in Hadoop 2, separating resource management from data processing to provide a more generalized processing platform and therefore enabling multiple applications and workloads in Hadoop. More details on that here.

This was not only a way for me to learn something new, but also a useful exercise to prove to ourselves that both Hadoop and YARN are ready to support non-Java applications in a native manner. As you may know, both HDFS & YARN switched to Protocol Buffers based RPC system a short while ago with the intent of better supporting compatibility across versions and cross-language clients. A shout-out to our friends at Spotify for coming up with snakebite, a native Python client for HDFS! Obviously, I’ve been very keen on supporting native, non-Java, applications for YARN too; you can see where this is going…

With that context, the last bit of the puzzle was a free weekend a couple of weeks ago; with the added bonus of a couple of cross-country flights – I had a great time at the Chicago HUG talking YARN this month, particularly on the 66th floor of the Willis Tower… easily the best location ever for a Hadoop User Group! (Thanks to everyone, particularly to Trustwave for sponsoring and Marc Slusar & Mike Segel, the organizers). People who know me won’t be surprised to hear I look forward to long flights without distractions, it’s great for cutting code! So… game, set, commit.

Fast forward, and here we are. gohadoop (obviously) is now on github and includes a very early version of Hadoop IPC client to talk the Hadoop RPC protocol and YARN client libraries so that one can write a full-fledged, native, go YARN application. To my knowledge, it’s the first-ever native non-Java application in YARN – here is hoping for many, many more!

A quick tour:

That’s about it, once you have a YARN cluster up and running try running the dist_shell go application:

$ HADOOP_CONF_DIR=conf go run hadoop_yarn/examples/dist_shell/client.go

See http://golang.org/ for more about go itself, installation etc.

If all goes well, you should see something like on our YARN console:

gohadoop

That’s it!

I’ll talk more about this in the Hadoop YARN meetup on 9/27 at LinkedIn, feel free to hit me up with questions. Obviously it’s very early, but I hope it will be fun and useful. Love to get patches back too, keep those pull requests coming.

As always, feel free to chat on the YARN Hortonworks Forum if you have more questions about YARN or browse the great content on YARN on the Hortonworks blog.

Categorized by :
Developer Hadoop 2.0 YARN

Comments

|
May 14, 2014 at 8:34 pm
|

It’s just awesome!!!

|
September 29, 2013 at 1:45 pm
|

Thanks Ralph, I’ve edited the post so that people unfamiliar with YARN have a lead as you suggested. Thanks again.

|
September 29, 2013 at 6:53 am
|

Gophers reading this might not know that YARN is the next-generation MapReduce for Hadoop. Might be worth explaining that near the start. http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.
HDP 2.1 Webinar Series
Join us for a series of talks on some of the new enterprise functionality available in HDP 2.1 including data governance, security, operations and data access :
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.