Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
December 07, 2012
prev slideNext slide

Hadoop Summit Session for Your Consideration: Taking Hadoop to the Clouds

If you been following #hadoopsummit on twitter you might have noticed some excitement around the community choice, a public voting system that enables the entire Apache Hadoop community to have a say in the sessions chosen for #hadoopsummit EU. Anyone can vote and the top vote getters in each track will automatically be included in the #hadoopsummit EU agenda, March 20-21, 2013.

If you’re still deciding which sessions, in which tracks, should be so lucky to get your vote, I have one for your consideration. Our very own Steve Loughran went beyond the twitter-sphere and created a blog to promote why you should vote for his session: Taking Hadoop to the Clouds.

Before we proceed to Steve’s case, remember to vote in the Community Choice process. Help us shape the conference agenda by getting in your vote! Deadline is December 14, so vote today!

This is a guest blog post from Steve; making a strong case to why you should pick his session. 

The Hadoop summit vote list is up, and I have two proposals -currently undervoted. Even though I’m on the review committee for the futures strand, not even I could push through a talk, which had zero votes on it, -ideally I’d like my talks to get in through popular acclaim. I could just create 400 fake email addresses and vote-stuff that way, but I’m lazy.

For that reason, I’m going to talk in detail about why my talks will be so excellent that to even think about having them left out could be detrimental to the entire conference.











One of my talks is “Taking Hadoop to the Clouds”.

There are two competitors

  1. Deploying Hadoop in the Cloud, which looks at options, details and best practices. I don’t see anything particularly compelling in the abstract -I assume it’s got more votes as it’s the one that comes up first. Or they are trying the many-email-address-vote-stuffing technique(*).
  2. How to Deploy Hadoop Applications on Any Cloud & Optimize Price Performance.  This could be interesting, as it covers how CliQr deploys Hadoop on different infrastructures. It sounds like a rackable-style orchestration layer above infrastructures, for Hadoop it may have similarities with MastodonC’s Kixi work,

Why then, should people vote for mine?

I’m giving the talk.

This is not me being egocentrically smug about the quality of my presentations, but because I’m reasonably confident I know a lot about the area.

  1. My last time at HP Labs was spent on the implementation of the “Cells” virtual infrastructure: declarative configuration of the entire cluster design. The details were presented at the 5th IEEE/ACM conference on Utility and Cloud Computing, and will no doubt be in the ACM library. This means I know about IaaS implementation details; the problems of placement, why networking behaves the way it does, image management, what UIs could look like, what the APIs could be, etc.
  2. I’ve spent a lot of time publicly making Hadoop cloud-friendly. I presume that MS Azure and AWS ElasticMR have put in more hours, but unless they’re going to talk about their work, Tom White and myself are the next choices. Jun Ping and VMWare colleagues have done a lot too -and big patches into the codebase, but I don’t see any submissions from them.
  3. I have opinions on the matter. They aren’t clear cut “cloud good/physical bad” or “physical bad/cloud good”. There are arguments either way; it depends on what you want to do, what your data volume is, and where it lives.
  4. I’m still working in the area, in Hadoop itself and the code nearby.

Recent cloud-related activities include

  • HADOOP-8545: a  Swift Filesystem driver for OpenStack. This is something everyone running Hadoop on Rackspace or other OpenStack clusters will want. This week two different implementations have surfaced, getting them merged together is going to be the next activity,
  • WHIRR-667: Add whirr support for HDP-1 installation
  • Ambari with Whirr. Proof of concept more than anything else.
  • Jclouds and Rackspace UK throttling. Adrian Cole managed to reduce the impact of issue-549, which is good as I don’t really want to get sucked into a different OSS codebase,
  • Other things that I’m not going to talk about -yet.

That’s why people should vote for me. The other talks will be about “how we got Hadoop to work in a virtual world” -mine will be about how we improved Hadoop to work in a virtual world.

(*) ps, for anyone planning the many-email-accounts approach, remember that the email addresses are something we reviewers can look at, and many sequential accounts all doing three votes to a single talk will show up as “statistically significant”. Russ has the data, he likes his analyses. He may even have the IP addresses.

[Photo: an interview with Page 6 Guy at ApacheCon]


You can also access Steve’s blog here.


Leave a Reply

Your email address will not be published. Required fields are marked *