Big Data in London – Thoughts From the Tube
Hortonworks sponsored the O’Reilly Strata conference in earlier this month at the Hilton Metropole in London. It was great meeting big data enthusiasts at the conference. We had fun giving away our little green mascot and came away pleasantly surprised at the state of interest in Big Data in the UK and Europe. There were over 500 attendees, which for a first time conference is a very good result. Conversations ranged from introductory “What is Apache Hadoop?” to deep discussions regarding how Hadoop was being used in production today. After talking to other vendors, attendees and organizers it appears that the market is somewhere between 12 and 18 months less mature than the Big Data market in the US. That said we think adoption could occur more quickly in the US as the state of the technology and ecosystem evolves heading into 2013. Below are some perspectives from our team at this conference.
Inspiration from the Tube
Riding the tube around London we couldn’t help but take some guidance and inspiration from the prominently placed signs for the “Way Out” and frequent announcements warning travelers to “Mind the Gap”. These signs and notices as informal guidance for approaching the Big Data market.
As more and more organizations realize that their current systems are at risk of being buried underground by the onslaught of Big Data many are starting to realize that Hadoop offers a Way Out. How you ask? Because it gives them a low cost scale out infrastructure to capture, process and exchange data. With Hadoop they now can cluster commodity servers and storage together to capture, process and exchange data with existing systems. At the same time a modern enterprise ready Hadoop platform like the Hortonworks Data Platform enables them to efficiently and effectively operate these clusters as well but that is for another post.
Mind the Gap
That said when selecting a Hadoop platform it is important to Mind the Gaps in the technology and look for a platform that is being deeply integrated with existing enterprise architecture systems. The best solutions to rely on are those that are created through engineering level engagements to maximize performance and optimize the interaction between the systems.
Deep technical interest and curiosity
Many of the visitors had technical questions, for which we pulled in our UK R&D person, Steve Loughran, armed with copies of the Hadoop 1.x and trunk source trees. The content of those discussions showed that people are already using Hadoop at scale in parts of Europe and nearby. Indeed, we had conversations with people as far away as Finland and Israel, showing that this conference drew a wide audience – and that those people were building up their skills in the technology and applications of Big Data.
There was also the London-and-South of England Hadoop community, who tend to know each other from the London HUG events and other workshops. Many of these are drawn from various startups -Last.fm being one of the earliest adopters of Hadoop; Datasift, Mendeley and others now becoming well known. Alongside them: the enterprises with datasets that historically were too big to store cost-effectively: the telcos, the media companies with their advert click throughs, and the like. These people have the data -and are ramping up the skills to make use of it. For these organizations, bringing up large Hadoop clusters matters -and they’ve realized that Hadoop internals aren’t something they need to know themselves -any more than they need Linux kernel skills. What they do need is Data Science skills: people who know the right questions to ask of that data, how to ask Hadoop for the data to provide the answers, how to interpret those answers -and how to present them.
Many of the Strata topics looked at these problems: cleaning up data, conducting effective A/B tests, and examples of highly effective visualizations of large and near-real-time data sources. One memorable talk from the Formula 1 race team McLaren covered how they had transformed their organization to be data-driven; to use the answers from their in-race telemetry and information gleaned about competitors from public sources to shape their thinking. This shows a future for organizations -to copy McLaren, Google and others to not only collect and analyze data -but to embrace it.
Exciting future for Big Data in Europe
Overall we had many great conversations with attendees regarding their current and more commonly future plans for use of Hadoop and other Big Data technologies. Many of the sessions were packed including a standing room only Microsoft talk on current Hadoop related integration and future plans.
Awareness of Apache Hadoop as a technology was respectable but certainly below that in the US.
Interest in technical and business benefits of Hadoop
Finally, Transport for London are themselves participants in the Big Data revolution -their live data feeds of tube, bus and bike-sharing are all there for analysis and integration with other data sources: http://www.tfl.gov.uk/businessandpartners/syndication/16493.aspx. If anyone wants some interesting datasets to learn Pig on, these could be them.
Overall, this was well run event and featured interesting keynotes. It was vibrant, ripe for growth, and was very honored to be approached by multiple user groups seeking speakers from Hortonworks to talk about big data experiences and expertise from this conference.
Until next time London, mind the gap.