Big Data in London – Thoughts From the Tube

Hortonworks sponsored the O’Reilly Strata conference in earlier this month at the Hilton Metropole in London. It was great meeting big data enthusiasts at the conference. We had fun giving away our little green mascot and came away pleasantly surprised at the state of interest in Big Data in the UK and Europe. There were over 500 attendees, which for a first time conference is a very good result. Conversations ranged from introductory “What is Apache Hadoop?” to deep discussions regarding how Hadoop was being used in production today. After talking to other vendors, attendees and organizers it appears that the market is somewhere between 12 and 18 months less mature than the Big Data market in the US. That said we think adoption could occur more quickly in the US as the state of the technology and ecosystem evolves heading into 2013. Below are some perspectives from our team at this conference.

Inspiration from the Tube

Riding the tube around London we couldn’t help but take some guidance and inspiration from the prominently placed signs for the “Way Out” and frequent announcements warning travelers to “Mind the Gap”. These signs and notices as informal guidance for approaching the Big Data market.

Way Out

As more and more organizations realize that their current systems are at risk of being buried underground by the onslaught of Big Data many are starting to realize that Hadoop offers a Way Out.  How you ask? Because it gives them a low cost scale out infrastructure to capture, process and exchange data. With Hadoop they now can cluster commodity servers and storage together to capture, process and exchange data with existing systems. At the same time a modern enterprise ready Hadoop platform like the Hortonworks Data Platform enables them to efficiently and effectively operate these clusters as well but that is for another post.

Mind the Gap

That said when selecting a Hadoop platform it is important to Mind the Gaps in the technology and look for a platform that is being deeply integrated with existing enterprise architecture systems. The best solutions to rely on are those that are created through engineering level engagements to maximize performance and optimize the interaction between the systems.

Deep technical interest and curiosity

Many of the visitors had technical questions, for which we pulled in our UK R&D person, Steve Loughran, armed with copies of the Hadoop 1.x and trunk source trees. The content of those discussions showed that people are already using Hadoop at scale in parts of Europe and nearby. Indeed, we had conversations with people as far away as Finland and Israel, showing that this conference drew a wide audience – and that those people were building up their skills in the technology and applications of Big Data.

There was also the London-and-South of England Hadoop community, who tend to know each other from the London HUG events and other workshops. Many of these are drawn from various startups -Last.fm being one of the earliest adopters of Hadoop; Datasift, Mendeley and others now becoming well known. Alongside them: the enterprises with datasets that historically were too big to store cost-effectively: the telcos, the media companies with their advert click throughs, and the like. These people have the data -and are ramping up the skills to make use of it. For these organizations, bringing up large Hadoop clusters matters -and they’ve realized that Hadoop internals aren’t something they need to know themselves -any more than they need Linux kernel skills. What they do need is Data Science skills: people who know the right questions to ask of that data, how to ask Hadoop for the data to provide the answers, how to interpret those answers -and how to present them.

Many of the Strata topics looked at these problems: cleaning up data, conducting effective A/B tests, and examples of highly effective visualizations of large and near-real-time data sources. One memorable talk from the Formula 1 race team McLaren covered how they had transformed their organization to be data-driven; to use the answers from their in-race telemetry and information gleaned about competitors from public sources to shape their thinking. This shows a future for organizations -to copy McLaren, Google and others to not only collect and analyze data -but to embrace it.

Exciting future for Big Data in Europe

Overall we had many great conversations with attendees regarding their current and more commonly future plans for use of Hadoop and other Big Data technologies. Many of the sessions were packed including a standing room only Microsoft talk on current Hadoop related integration and future plans.

Awareness of Apache Hadoop as a technology was respectable but certainly below that in the US.

Interest in technical and business benefits of Hadoop

Shaun Connolly’s sessions on Hadoop and data warehousing were well attended, as was Steve Loughran’s session on High Availability Hadoop including a live demo.

Finally, Transport for London are themselves participants in the Big Data revolution -their live data feeds of tube, bus and bike-sharing are all there for analysis and integration with other data sources: http://www.tfl.gov.uk/businessandpartners/syndication/16493.aspx. If anyone wants some interesting datasets to learn Pig on, these could be them.

Overall, this was well run event and featured interesting keynotes. It was vibrant, ripe for growth, and was very honored to be approached by multiple user groups seeking speakers from Hortonworks to talk about big data experiences and expertise from this conference.

Thanks to those that attended our sessions and visited and chatted with us at our booth. For a copy of Shaun Connolly and Steve Loughran’s presentations, you can acces it here and here.

Until next time London, mind the gap.

Categorized by :
Hadoop Ecosystem Industry Happenings

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

HDP 2.1 Webinar Series
Join us for a series of talks on some of the new enterprise functionality available in HDP 2.1 including data governance, security, operations and data access :
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.