Category Archives: Industry Happenings


7 Key Drivers for the Big Data Market

I attended the Goldman Sachs Cloud Conference and participated on a panel focused on “Data: The New Competitive Advantage”. The panel covered a wide range of questions, but kicked off covering two basic questions:

“What is Big Data?” and “What are the drivers behind the Big Data market?”

While most definitions of Big Data focus on the new forms of unstructured data flowing through businesses with new levels of “volume, velocity, variety, and complexity”, I tend to answer the question using a simple equation:

Big Data = Transactions + Interactions + Observations

The following graphic illustrates what I mean:

Read More

Record Support for Hadoop Summit

In case you didn’t see the news today, Hadoop Summit announced record ecosystem support for this year’s conference. The original and world’s largest Apache Hadoop conference, now in its fifth year, is being sponsored this year by more than 40 traditional and open source software and services companies.

Hortonworks and our co-host Yahoo! would like to thank the following companies for helping to make Hadoop Summit possible:

Read More

Hadoop Observations from the U.K.

As part of Big Data Week, Dan Harvey of the London Hadoop User Group organised an afternoon session for the usergroup, which we were glad to sponsor, along with Canonical and Facegroup. I had the pleasure of presenting my view of the current and future status of Apache Hadoop to an audience that ranged from those curious about Hadoop to heavy users.

Every talk of the day was excellent, from the use cases by Datasift, Mendeley and MusicMetric, to the talk by Francine Bennett of MastodonC on the CO2 footprint of different cloud computing infrastructures, including a live dashboard on the current CO2/hour of many cloud infrastructure sites.

In my discussions with attendees, I was impressed how broadly Hadoop is starting to be adopted in the U.K. There is adoption from “pure data” companies like Mendeley, DataSift, MusicMatch, Last.fm, as well as media companies and financial organisations. London is a centre of finance and data and as such, from a Hadoop perspective, it is a source of data waiting to be stored and mined.

Read More

Hortonworks Welcomes Citrix and CloudStack to the Apache Community

We are pleased to support today’s announcement from Citrix that they have contributed CloudStack to the Apache community. For those new to CloudStack, it is an open source cloud computing software that helps organizations build and manage cloud infrastructures. It is similar to Amazon Web Services EC2 environment except that it enables organizations to build public, private or hybrid cloud environments using their own pooled computing resources.

Citrix announced today that they were reaffirming their commitment to open source by working with the Apache Software Foundation to make CloudStack 3 an Apache project, released under Apache Software License 2.0. This is yet further acknowledgement that Apache is the logical home for open source projects that are transforming the enterprise software industry. As a Gold Sponsor of the ASF and major contributor to Apache projects, Hortonworks is pleased that leading vendors such as Citrix are recognizing the value that Apache can provide in terms of accelerating development and innovation and driving adoption as the preferred destination for enterprise-class open source software.

Read More

Announcing the Hadoop Summit Community Choice Winners

Thank you to the community members that cast over 8,000 votes during the Hadoop Summit Community Choice voting process. The turnout far exceeded our expectations and is further evidence that the momentum behind Apache Hadoop has never been stronger.

As we announced, the sessions with the most votes in each track are automatically accepted into the Hadoop Summit agenda. As such, I am pleased to announce the winners of the Hadoop Summit Community Choice vote and the first confirmed sessions in the Hadoop Summit program:

Future of Apache Hadoop track: Dynamic Namespace Partitioning with Giraffa File System, Konstantin Shvachko (eBay)

Deployment and Operations track: Dynamic Reconfiguration of Apache Zookeeper, Alexander Shraer and Benjamin Reed (Yahoo!)

Enterprise Data Architecture track: iMStor: Hadoop Storage-based Tiering Platform, Vishal Malik (Cognizant Technology Solutions)

Applications and Data Science track: Hadoop & Cloud @Netflix: Taming the Social Data Firehose, Mohammad Sabah (Netflix)

Analytics and Business Intelligence track: Mapping and Reducing Passenger Turbulence using Big Data, Farhan Hussain and Saad Patel (Open Source Architect)

Hadoop in Action track: The Merchant Lookup Service at Intuit, Vrushali Channapattan (Intuit)

Read More

Hadoop Summit Community Choice

As I first mentioned when we announced Hadoop Summit 2012, we are focused on making Hadoop Summit the preeminent conference for the Apache Hadoop community. Today I’m pleased to tell you about Community Choice, a public online voting system that enables the entire Apache Hadoop community to have a say in the sessions chosen for Hadoop Summit. Anybody can vote and the top vote getters in each track will automatically be included in the Hadoop Summit agenda.

One of the things you will notice when you vote is the large number of abstracts that were submitted for the conference. In fact, there were 267 submissions for Hadoop Summit, more than double the number of submissions from last year’s highly successful event. There are six tracks; each of which has a wide selection of compelling topics. Another interesting fact is that there were submissions from 120 different organizations (companies, universities and government agencies). It’s becoming even clearer that Apache Hadoop is having a significant impact in the data industry.

In addition to Community Choice, there is also a content selection committee in place that will identify the other sessions for Hadoop Summit. This is also a community effort. The content selection committee is made up of 36 leaders from the ecosystem representing 27 different organizations (vendors, end users and universities). The committee is hard at work reviewing sessions and we expect to be able to publish the final agenda before the end of March.

Please remember to vote in the Community Choice process. If you ever wanted to have input into a conference, this is your chance. Voting ends March 20th, so please vote today.

~E14

Hadoop Summit 2012 is Coming

Hi Folks,

I’m happy to report that Hadoop Summit will be back for it’s 5th year. This year, Hortonworks and Yahoo are jointly hosting the conference, which will take place on June 13th and 14th at the San Jose Convention Center.

This year’s event promises to be bigger and better than ever. We have extended the conference to a second day, added additional session tracks and expect to showcase even more compelling and useful presentations. You will be really impressed when you see what we have planned.

Read More

Good Times at ApacheCon 2011

I spent some time last week at ApacheCon NA 2011 in Vancouver, BC. It was a good experience and I enjoyed catching up with friends and colleagues involved in the Hadoop project and also meeting some of the executives of the Apache Software Foundation in person. It is clear that the Apache community is thriving and that interest in Hadoop remains very high.

Hortonworks is committed to supporting Apache and we are pleased to have been a gold sponsor of this event. I delivered the day two keynote at ApacheCon on the success of Apache Hadoop. To view my presentation please visit Slideshare.net.

~E14
@jeric14, @hortonworks 

Apache Lucene Eurocon Keynote

I just spent a day at the Apache Lucene Eurocon conference in Barcelona. I gave a keynote presentation on how the Apache Lucene & Solr communities had a lot to gain from Apache Hadoop and how Hadoop could also gain from their contributions and technology. It was a good show and it was great to have a chance to meet the Lucid Imagination folks and others in the Apache search community.

I have more questions than answers right now in terms of how these tool chains will be combined over time, but I am confident that they will. The Mahout session was packed, which is a good predictor of more Lucene & Solr + Hadoop users coming soon. The Solr sessions were a trip down memory lane for me. The Solr community is building out capabilities that used to only be available to the Big Internet Search players. It is nice to see these ideas having wider impact via Apache.

The slides from my keynote are now available on Slideshare.net.

Gracias Lucid Labs folks

~ E14
@jeric14@hortonworks

RIP Dennis M. Richie

Dennis M. Richie was a giant of our craft and a truly great teacher. Millions of folks owe their passion for programming to K&R. It literally started their careers.

We are very sad to hear that Dennis M. Ritchie passed away after a prolonged illness. We join others in offering our condolences to his family and hope he is in a more comfortable place.

Thank you so much. You’ve made such a difference to so many.

Rest in peace.

 

- Arun C. Murthy

Recent Hortonworks Presentations

Interest in Hortonworks and Apache Hadoop continues to rise. This past week, I presented at two conferences and had a number of requests to share our slides. Both presentations are now posted on slideshare.net and linked to in this blog.

The first conference was the Cowen Big Data Day in New York City. The slides for this presentation are available here. The Cowen Group is a leading financial services and investment banking firm. They hosted a one-day conference on Big Data for the investment community and invited the CEOs of many of the leading providers in the market, including Hortonworks. My presentation covered the role that Apache Hadoop is playing within enterprise architectures and the long-term opportunities that exist. There is also some insight into the Hortonworks strategy that might be interesting to folks that want to better understand our business.

Read More

Do You Have an Interesting HDFS Use Case?

Hi Folks,

I’m talking at a storage conference this month and I’d like to see if crowdsourcing will generate interesting examples and studies that I can include in my presentation.

What I’d like is interesting cases where HDFS has been compared to other storage technologies. Especially interested in cases where the decision was made to deploy HDFS rather than to buy an alternative technology.  Also interested in any large deployments where HDFS is being used for interesting things beyond being the serving layer for MapReduce and HBase.  If you have an interesting story, slides or other material that you think might be helpful for an HDFS presentations, please send me a note at HdfsCases2011-group@hortonworks.com.

Read More

Takeaways from OSCON 2011

For the first time in its history, OSCON, the premier open-source conference, had a special OSCON Data sub-conference. Apache Hadoop had a full track dedicated to it at OSCON Data. This clearly was indicative of the interest in Big Data and the central role Apache Hadoop plays in the space. A special shout out to Bradford Stephens and Sarah Novotny, the program chairs, who did a fantastic job with OSCON Data.

Hortonworks was well represented at OSCON Data 2011. Owen O’Malley and I presented talks and Alan Gates took a short break from his vacation to stop-by.

Owen presented a very interesting talk on ‘Developing and Deploying Hadoop Security’. The presentation covered the goals of Hadoop Security and how to use the new features to ensure the security of their HDFS and MapReduce clusters. Owen also talked about Yahoo’s experiences deploying the back-ported Hadoop Security features on their science and production clusters. He also covered details on the several man-years of effort which went into developing the comprehensive and well-integrated security work the Hortonworks (formerly at Yahoo!) team spent.

Read More

Hadoop Summit Presentations

More news. We’ve put the Hortonworks slides from the Hadoop Summit on slideshare.net for those that are interested in seeing them:

Hortonworks Hadoop Summit 2011 Keynote – Eric14 (my keynote)

Crossing the Chasm: Hadoop for the Enterprise – Sanjay Radia

Next Generation Apache Hadoop MapReduce – Arun C. Murthy

Introducing HCatalog (Hadoop Table Manager) – Alan Gates

HDFS Federation and Other Features – Suresh Srinivas and Sanjay Radia

Read More

Go to page:123