The Hortonworks Blog

More from Eric Baldeschwieler

Back in late June when Hortonworks was officially announced at Hadoop Summit, we explained that our strategy was going to focus on accelerating the development and adoption of Apache Hadoop. We made bold statements about the opportunities that Apache Hadoop had to become the de facto platform for big data. We even predicted that half of the world’s data would be processed by Apache Hadoop within five years.

We also talked about how in order for all of that to happen, we needed to address the technical and knowledge gaps that exist.…

I just spent a day at the Apache Lucene Eurocon conference in Barcelona. I gave a keynote presentation on how the Apache Lucene & Solr communities had a lot to gain from Apache Hadoop and how Hadoop could also gain from their contributions and technology. It was a good show and it was great to have a chance to meet the Lucid Imagination folks and others in the Apache search community.…

If when we started building an Apache Hadoop team at Yahoo!, someone had told me that in the future we would partner with Microsoft to improve Hadoop’s performance on Windows, I would have found the prediction hard to believe. The first time a Microsoft executive suggested that they would like to work with us to improve Apache Hadoop, I told them I found their proposal “mind-bending”. I also told them that if we could do it the right way, I liked the idea.…

Several weeks ago, Hortonworks published a blog post that highlighted the tremendous contributions that Yahoo has made to Hadoop over the years. The point was two-fold: 1) to pay homage to our former employer, and 2) to clarify that Yahoo will continue to be a major contributor to Hadoop.

Earlier this week, Cloudera responded to our post, calling it a misleading story. While we generally don’t comment on another vendor’s blogs, even if they assert things that we find questionable, we felt we had to respond to this one.…

Oracle embraced Apache Hadoop this week with the announcement of the Oracle Big Data Appliance that includes an open source distribution of Apache Hadoop.

We welcome Oracle to the Apache Hadoop community and look forward to their participation in the growing Hadoop ecosystem.  We hope that Oracle will commit to using the official releases of Hadoop from the Apache Foundation.  We believe that such a commitment will allow their customers to extract the most possible value from their Hadoop Appliances and facilitates the rapid growth of the Hadoop ecosystem.…

Hi Folks,

Hortonworks is a fast-growing software company that is looking for new talent that can make a positive impact on our company whether in development, QA and test, support and training or on the business side of the operations.  We recently updated the careers section of our website, adding a number of exciting job openings. We are very interested in filling each of these roles with great people as soon as possible.…

I’m pleased to announce that we’ve become a sponsor of the Apache Software Foundation (ASF). The ASF has been fundamental to Apache Hadoop’s success and our team’s ability to meet our goals since the inception as the Yahoo! Hadoop team in 2006. This is why we convinced Yahoo! to become a Apache Platinum Sponsor back in 2007, which it remains to this day. Now that we are operating as an independent company and continuing to benefit from Apache’s support, we made it a priority to continue to sponsor Apache.…

Interest in Hortonworks and Apache Hadoop continues to rise. This past week, I presented at two conferences and had a number of requests to share our slides. Both presentations are now posted on slideshare.net and linked to in this blog.

The first conference was the Cowen Big Data Day in New York City. The slides for this presentation are available here. The Cowen Group is a leading financial services and investment banking firm.…

Hi Folks,

I’m talking at a storage conference this month and I’d like to see if crowdsourcing will generate interesting examples and studies that I can include in my presentation.

What I’d like is interesting cases where HDFS has been compared to other storage technologies. Especially interested in cases where the decision was made to deploy HDFS rather than to buy an alternative technology.  Also interested in any large deployments where HDFS is being used for interesting things beyond being the serving layer for MapReduce and HBase.  …

We get asked a lot of questions about how to select Apache Hadoop worker node hardware. During my time at Yahoo!, we bought a lot of nodes with 6*2TB SATA drives, 24GB RAM and 8 cores in a dual socket configuration. This has proven to be a pretty good configuration. This year, I’ve seen systems with 12*2TB SATA drives, 48GB RAM and 8 cores in a dual socket configurations. We will see a move to 3TB drives this year.…

Hi Folks,

I’d like to congratulate Arun Murthy on his very popular Hadoop Summit talk. SlideShare.net reports that his presentation has gone viral. They originally promoted it as the most discussed SlideShare.net presentation on Linked In and yesterday they promoted it as the most Tweeted about presentation. In both cases, the presentation was moved up to the front page.

Arun is a Hortonworks founder and MapReduce expert. His talk does a great job of highlighting some of the current limitations in MapReduce and then outlining the roadmap for improving areas such as scalability, high availability, cluster utilization and support for paradigms other than MapReduce.…

Hi Folks,

Things are going really well at Hortonworks.  We’re in our new office, connected to our data center of nearly 1000 nodes (thanks Yahoo!) and working away on our new computers.  We’ve gotten a lot done in a very small amount of time.  Along with our excellent G&A team, a key reason we’ve gotten so much done is that our founders have really stepped up and are taking responsibility for getting their teams moving.…

More news. We’ve put the Hortonworks slides from the Hadoop Summit on slideshare.net for those that are interested in seeing them:

Hortonworks Hadoop Summit 2011 Keynote – Eric14 (my keynote)

Crossing the Chasm: Hadoop for the Enterprise – Sanjay Radia

Next Generation Apache Hadoop MapReduce – Arun C. Murthy

Introducing HCatalog (Hadoop Table Manager) – Alan Gates

HDFS Federation and Other Features – Suresh Srinivas and Sanjay Radia…

Wow, Hortonworks day one!   Our first day of being “on the record”. It’s been a busy, but very productive day.  Now that we are talking publicly about Hortonworks, there has been a LOT of interest in what we’re doing from analysts and journalists. So far the feedback we’re received has been very positive.

I haven’t been able to read every article but a few have caught my eye that I wanted to share.…

We’re glad to have finally launched Hortonworks after months of planning and speculation. I thought I’d use the opportunity of my first Hortonworks blog to lay out who we are and what we’re all about.

Our History

Hortonworks was formed by Yahoo! and Benchmark Capital in June 2011 in order to accelerate the development and adoption of Apache Hadoop. We believe that Apache Hadoop will become the de facto platform for storing, managing and analyzing “big data,” namely the exploding volume of data being generated daily by organizations around the globe.…

Go to page:12

Thank you for subscribing!