Posts by Eric Baldeschwieler:


Delivering on our Promises

Back in late June when Hortonworks was officially announced at Hadoop Summit, we explained that our strategy was going to focus on accelerating the development and adoption of Apache Hadoop. We made bold statements about the opportunities that Apache Hadoop had to become the de facto platform for big data. We even predicted that half of the world’s data would be processed by Apache Hadoop within five years.

We also talked about how in order for all of that to happen, we needed to address the technical and knowledge gaps that exist. We needed to heavily invest in engineering to make Hadoop easier to install, manage and use for enterprises and more open and extensible for a growing ecosystem of technology and service providers.

Today we are making a series of announcements that are an important first step in delivering on these promises:

Read More

Apache Lucene Eurocon Keynote

I just spent a day at the Apache Lucene Eurocon conference in Barcelona. I gave a keynote presentation on how the Apache Lucene & Solr communities had a lot to gain from Apache Hadoop and how Hadoop could also gain from their contributions and technology. It was a good show and it was great to have a chance to meet the Lucid Imagination folks and others in the Apache search community.

I have more questions than answers right now in terms of how these tool chains will be combined over time, but I am confident that they will. The Mahout session was packed, which is a good predictor of more Lucene & Solr + Hadoop users coming soon. The Solr sessions were a trip down memory lane for me. The Solr community is building out capabilities that used to only be available to the Big Internet Search players. It is nice to see these ideas having wider impact via Apache.

The slides from my keynote are now available on Slideshare.net.

Gracias Lucid Labs folks

~ E14
@jeric14@hortonworks

The Why’s Behind the Microsoft and Hortonworks Partnership

If when we started building an Apache Hadoop team at Yahoo!, someone had told me that in the future we would partner with Microsoft to improve Hadoop’s performance on Windows, I would have found the prediction hard to believe. The first time a Microsoft executive suggested that they would like to work with us to improve Apache Hadoop, I told them I found their proposal “mind-bending”. I also told them that if we could do it the right way, I liked the idea. Our core mission is to bring Apache Hadoop to the widest possible user base and Windows and SQL Server have a very large user bases.

Why is adding a fraction of the Microsoft Windows, Azure and SQL Server user bases to the Hadoop community a good thing for Apache Hadoop? Microsoft technology is used broadly across enterprises today. Ultimately, open source is all about community building. A growing user community feeds a virtuous circle. More users means more visibility for the project. Their successes fuel the adoption of the project by more users. More users mean more folks who will ultimately become contributors or committers. This makes the code evolve more quickly, which allows it to satisfy more use cases and hence attract more users, which further drives the project forward. As the number of users and developers grow, more companies will decide that they can build hardware, tools, applications and services for Apache Hadoop users. Growth of the ecosystem allows more users to solve more problems with Apache Hadoop, driving further growth, etc. Feeding this virtuous cycle is what Hortonworks is all about.

Read More

Reality Check: Contributions to Apache Hadoop

Several weeks ago, Hortonworks published a blog post that highlighted the tremendous contributions that Yahoo has made to Hadoop over the years. The point was two-fold: 1) to pay homage to our former employer, and 2) to clarify that Yahoo will continue to be a major contributor to Hadoop.

Earlier this week, Cloudera responded to our post, calling it a misleading story. While we generally don’t comment on another vendor’s blogs, even if they assert things that we find questionable, we felt we had to respond to this one.

Underneath a lot of words, their claim was that Cloudera had made the most contributions to Apache Hadoop this year of any single organization.

Read More

Oracle and the Apache Hadoop Community

Oracle embraced Apache Hadoop this week with the announcement of the Oracle Big Data Appliance that includes an open source distribution of Apache Hadoop.

We welcome Oracle to the Apache Hadoop community and look forward to their participation in the growing Hadoop ecosystem.  We hope that Oracle will commit to using the official releases of Hadoop from the Apache Foundation.  We believe that such a commitment will allow their customers to extract the most possible value from their Hadoop Appliances and facilitates the rapid growth of the Hadoop ecosystem.

Read More

Fourteen Reasons to Become a Hortonworker

Hi Folks,

Hortonworks is a fast-growing software company that is looking for new talent that can make a positive impact on our company whether in development, QA and test, support and training or on the business side of the operations.  We recently updated the careers section of our website, adding a number of exciting job openings. We are very interested in filling each of these roles with great people as soon as possible.

Why choose Hortonworks?  Here are fourteen reasons:

#1 Work for a company with a mission. We are architecting the future of big data. As the leading contributor to Apache Hadoop, Hortonworks is helping to revolutionize and commoditize the storage and processing of big data. This is the type of opportunity that comes around only once in a generation.

Read More

Sponsoring the Apache Software Foundation

Hortonworks Apache Software Foundation Gold Sponsor

I’m pleased to announce that we’ve become a sponsor of the Apache Software Foundation (ASF). The ASF has been fundamental to Apache Hadoop’s success and our team’s ability to meet our goals since the inception as the Yahoo! Hadoop team in 2006. This is why we convinced Yahoo! to become a Apache Platinum Sponsor back in 2007, which it remains to this day. Now that we are operating as an independent company and continuing to benefit from Apache’s support, we made it a priority to continue to sponsor Apache.

We are committed to growing and fostering the Apache Hadoop ecosystem and making Hadoop the platform of choice for managing big data in the enterprise. Our sponsorship deepens this commitment and will help the ASF to continue to grow and prosper.

Read More

Recent Hortonworks Presentations

Interest in Hortonworks and Apache Hadoop continues to rise. This past week, I presented at two conferences and had a number of requests to share our slides. Both presentations are now posted on slideshare.net and linked to in this blog.

The first conference was the Cowen Big Data Day in New York City. The slides for this presentation are available here. The Cowen Group is a leading financial services and investment banking firm. They hosted a one-day conference on Big Data for the investment community and invited the CEOs of many of the leading providers in the market, including Hortonworks. My presentation covered the role that Apache Hadoop is playing within enterprise architectures and the long-term opportunities that exist. There is also some insight into the Hortonworks strategy that might be interesting to folks that want to better understand our business.

Read More

Do You Have an Interesting HDFS Use Case?

Hi Folks,

I’m talking at a storage conference this month and I’d like to see if crowdsourcing will generate interesting examples and studies that I can include in my presentation.

What I’d like is interesting cases where HDFS has been compared to other storage technologies. Especially interested in cases where the decision was made to deploy HDFS rather than to buy an alternative technology.  Also interested in any large deployments where HDFS is being used for interesting things beyond being the serving layer for MapReduce and HBase.  If you have an interesting story, slides or other material that you think might be helpful for an HDFS presentations, please send me a note at HdfsCases2011-group@hortonworks.com.

Read More

Best Practices for Selecting Apache Hadoop Hardware

We get asked a lot of questions about how to select Apache Hadoop worker node hardware. During my time at Yahoo!, we bought a lot of nodes with 6*2TB SATA drives, 24GB RAM and 8 cores in a dual socket configuration. This has proven to be a pretty good configuration. This year, I’ve seen systems with 12*2TB SATA drives, 48GB RAM and 8 cores in a dual socket configurations. We will see a move to 3TB drives this year.

What configuration makes sense for any given organization is driven by such ratios as the storage-to-compute ratio of your workload and other factors that cannot be answered in a generic way. Further, the hardware industry moves quickly. In this post I’ll try to outline the principles that have generally guided Hadoop hardware configuration selections over the last six years. All of these thoughts are aimed at designing medium to large Apache Hadoop clusters. Scott Carey made a good case for smaller machines for small clusters the other day on the Apache mailing list.

Read More

Gone Viral: Next Generation of Apache Hadoop MapReduce

Hi Folks,

I’d like to congratulate Arun Murthy on his very popular Hadoop Summit talk. SlideShare.net reports that his presentation has gone viral. They originally promoted it as the most discussed SlideShare.net presentation on Linked In and yesterday they promoted it as the most Tweeted about presentation. In both cases, the presentation was moved up to the front page.

Arun is a Hortonworks founder and MapReduce expert. His talk does a great job of highlighting some of the current limitations in MapReduce and then outlining the roadmap for improving areas such as scalability, high availability, cluster utilization and support for paradigms other than MapReduce.

Read More

Introducing the Hortonworks Founders

Hi Folks,

Things are going really well at Hortonworks.  We’re in our new office, connected to our data center of nearly 1000 nodes (thanks Yahoo!) and working away on our new computers.  We’ve gotten a lot done in a very small amount of time.  Along with our excellent G&A team, a key reason we’ve gotten so much done is that our founders have really stepped up and are taking responsibility for getting their teams moving.

I wanted to take this opportunity to mention them, because without them Hortonworks wouldn’t be Hortonworks.  These are the team leads and architects I’ve worked with and relied on over the last 4-6 year while we invested in taking Apache Hadoop from an early prototype to what it is today.  Without our founders and their teams Map-Reduce, HDFS and Pig would not be what they are today.

Read More

Hadoop Summit Presentations

More news. We’ve put the Hortonworks slides from the Hadoop Summit on slideshare.net for those that are interested in seeing them:

Hortonworks Hadoop Summit 2011 Keynote – Eric14 (my keynote)

Crossing the Chasm: Hadoop for the Enterprise – Sanjay Radia

Next Generation Apache Hadoop MapReduce – Arun C. Murthy

Introducing HCatalog (Hadoop Table Manager) – Alan Gates

HDFS Federation and Other Features – Suresh Srinivas and Sanjay Radia

Read More

Hortonworks Launches in the Press

Wow, Hortonworks day one!   Our first day of being “on the record”. It’s been a busy, but very productive day.  Now that we are talking publicly about Hortonworks, there has been a LOT of interest in what we’re doing from analysts and journalists. So far the feedback we’re received has been very positive.

I haven’t been able to read every article but a few have caught my eye that I wanted to share. I’ll keep adding to the list over the next couple of days as the word spreads about Hortonworks.

Read More

Hortonworks Manifesto

We’re glad to have finally launched Hortonworks after months of planning and speculation. I thought I’d use the opportunity of my first Hortonworks blog to lay out who we are and what we’re all about.

Our History

Hortonworks was formed by Yahoo! and Benchmark Capital in June 2011 in order to accelerate the development and adoption of Apache Hadoop. We believe that Apache Hadoop will become the de facto platform for storing, managing and analyzing “big data,” namely the exploding volume of data being generated daily by organizations around the globe.

As one of the creators, the primary contributor to, and one of the leading users of Apache Hadoop, Yahoo! has extensive experience in realizing exceptional business value from the Hadoop platform. In fact, Hadoop is now behind every click at Yahoo!, running on 42,000 servers and delivering personalized content and experiences to nearly 700 million consumers worldwide. Apache Hadoop helps drive Yahoo!’s powerful advertising platform and enables Yahoo! to provide enhanced anti-spam capabilities in Yahoo! Mail, among many other uses.

Read More

Go to page:12