Hortonworks on Apache Hadoop


Apache ZooKeeper 3.4.3 Released

For those of you new to Apache ZooKeeper, it is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. To learn more about ZooKeeper, please visit the Apache ZooKeeper homepage.

As part of stabilizing Apache ZooKeeper 3.4 branch, ZooKeeper 3.4.3 has just been released. It is a bug fix release on the 3.4 branch and fixes 17 issues out of which 1 is very critical and can cause data inconsistency (ZOOKEEPER-1367). This particular issue can lead to data inconsistencies in your ZooKeeper servers. If you are currently using any of the 3.4.* releases, please make sure you upgrade to 3.4.3.…

Read More

Delivering on Hadoop .Next: Benchmarking Performance

In our previous blogs and webinars we have discussed the significant improvements and architectural changes coming to Apache Hadoop .Next (0.23). To recap, the major ones are:

  • Federation for Scaling HDFS – HDFS has undergone a transformation to separate Namespace management from the Block (storage) management to allow for significant scaling of the filesystem. In previous architectures, they were intertwined in the NameNode.
  • NextGen MapReduce (aka YARN) – MapReduce has undergone a complete overhaul in hadoop-0.23, including a fundamental change to split up the major functionalities of the JobTracker, resource management and job scheduling/monitoring into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs. Thus, Hadoop becomes a general purpose data-processing platform that can support MapReduce as well as other application execution frameworks such as MPI, Graph processing, Iterative processing etc.

Read More

Solving the Data Problem in a Big Way

I recently joined Hortonworks as VP of Corporate Strategy, and I wanted to share my thoughts as to what attracted me to Hortonworks.

For me, it’s important to 1) work with a top-notch team and 2) focus on unique market-changing business opportunities.

Hortonworks has a strong team of technical founders (Eric14, Alan, Arun, Deveraj, Mahadev, Owen, Sanjay, and Suresh) doing impressive work within the Apache Hadoop community. Hortonworks also has an impressive Board of Directors that includes folks like Peter Fenton, Mike Volpi, Jay Rossiter, Rob Bearden, as well as our most recent board member Paul Cormier (Red Hat’s President of Products and Technology).…

Read More

Hortonworks Recognized as a Leader in Forrester Wave Report

I am pleased to report that Hortonworks has been named a leader in the recently released Forrester Wave report on Enterprise Hadoop Solutions. We scored well across all three rating areas: current offering, market presence and strategy.

We appreciate the recognition, particularly this sentence that highlighted our role in the marketplace: ”(Hortonworks) is the technology leader and ecosystem building for the entire Hadoop industry and has recently released its Hortonworks Data Platform, which incorporates purely open-source Apache Hadoop software.”

Being named a Leader in the Forrester Wave on Enterprise Hadoop Solutions is one of many achievements for Hortonworks over the past seven months (stay tuned for a blog on this topic). While we proud of our past, we are much more focused on our future. We know that we must continue to drive innovation and work with the community to deliver high-quality Apache Hadoop releases. …

Read More

Paul Cormier Joins Hortonworks Board of Directors

I am pleased to announce that Paul Cormier has joined the Hortonworks Board of Directors. Paul is currently President, Products and Technologies at Red Hat, where he leads the company’s engineering and products business units. Paul has an exceptional background in building enterprise-class open source software. He also has helped Red Hat achieve tremendous revenue growth by enabling a rich ecosystem of partners. We are pleased to add such a talented and experienced open source professional to our board. His insights and guidance will play an important role in helping Hortonworks achieve our stated objective of enabling Apache Hadoop to become the foundation for the next generation enterprise data platform.

Welcome Paul!

~E14…

Read More

Hadoop Summit 2012 is Coming

Hi Folks,

I’m happy to report that Hadoop Summit will be back for it’s 5th year. This year, Hortonworks and Yahoo are jointly hosting the conference, which will take place on June 13th and 14th at the San Jose Convention Center.

This year’s event promises to be bigger and better than ever. We have extended the conference to a second day, added additional session tracks and expect to showcase even more compelling and useful presentations. You will be really impressed when you see what we have planned.…

Read More

Delivering the Next Generation of Apache Hadoop

Today we announced our plans to release a public preview of the Hortonworks Data Platform (HDP) version 2. HDP v2 will leverage Apache Hadoop 0.23, which is the first major update to Hadoop in more than three years. Among other advancements, HDP v2 will include the NextGen MapReduce architecture, HDFS NameNode HA and HDFS Federation. It will also include the most up-to-date stable components including HCatalog, HBase, Hive and Pig; all fully integrated and tested at scale.

In order to avoid confusion, let me explain the two versions of HDP:

  • HDP v1 is based upon Apache Hadoop 1.0 (which comes from the 0.20.205 branch). It the most stable, production-ready version of Hadoop that is currently found in many large enterprise deployments. HDP v1 is currently available as a private technology preview. A public technology preview will be made available later this quarter.

Read More

Shaun Connolly Joins Hortonworks

I’m pleased to announce that Shaun Connolly has joined our executive management team as VP of Corporate Strategy. Shaun is a veteran enterprise software and open source executive that comes to us from VMware and previously held positions at SpringSource and JBoss.

As VP of Corporate Strategy, Shaun will be responsible for helping us to achieve our business objectives by guiding corporate strategy and identifying new market opportunities for Apache Hadoop.  Shaun will also play a critical role in helping us position and grow the Hortonworks Data Platform (HDP) as a next-generation enterprise data management solution, helping organizations maximize the value from the wealth of data flowing throughout their enterprise.

Welcome aboard Shaun!

~E14…

Read More

Apache Hadoop Reaches Milestone: Release 1.0.0

Congratulations! The Hadoop Community has given itself a big holiday present: Release 1.0.0! This release has been six years in the making, and has involved:

  • Hard work and cooperation from dozens of software developers and contributors from across the industry, including of course Doug Cutting and Mike Cafarella’s early work in Nutch and the founding Hadoop team at Yahoo, Doug, Owen O’Malley and many others, with leadership from Eric14.  Special thanks to all the Hadoop committers.
  • Commitment to stability, joined with testing and indispensable production experience at scale, at industry-leading companies like Yahoo!, Facebook, LinkedIn, and others, including hundreds of millions of compute-hours and exabytes of data processed.
  • Feedback from hundreds of knowledgeable users, data scientists, systems engineers and architects.
  • Commitment to the philosophy and practice of opensource from Google, who published their seminal papers and have long supported Apache.

Read More

WebHDFS – HTTP REST Access to HDFS

Motivation

Apache Hadoop provides a high performance native protocol for accessing HDFS. While this is great for Hadoop applications running inside a Hadoop cluster, users often want to connect to HDFS from the outside. For examples, some applications have to load data in and out of the cluster, or to interact with the data stored in HDFS from the outside. Of course they can do this using the native HDFS protocol but that means installing Hadoop and a Java binding with those applications. To address this we have developed an additional protocol to access HDFS using an industry standard RESTful mechanism, called WebHDFS. As part of this, WebHDFS takes advantages of the parallelism that a Hadoop cluster offers. Further, WebHDFS retains the security that the native Hadoop protocol offers. It also fits well into the overall strategy of providing web services access to all Hadoop components.…

Read More

Guest Blog: The Elephant is in the Room

We’ve been looking for the elephant in the room for some time. We knew he was there, but we just couldn’t find him. It’s clear that he is now here and his name is Hortonworks. As such, we are very excited to announce today that Index Ventures has made an investment in Hortonworks.

The elephant toy – Hadoop – has become a household name in the Big Data sector these days and we’ve been tracking it for some time at Index. The Big Data world is complex and there are many components of it, but at the core of it all is Apache Hadoop’s revolutionary compute and storage architecture for data. We think that this might be one the most significant trend in data architecture in a decade.

To understand Apache Hadoop’s impact on compute and storage architectures, it’s important to understand where it came from and why it was needed in the first place.…

Read More

Good Times at ApacheCon 2011

I spent some time last week at ApacheCon NA 2011 in Vancouver, BC. It was a good experience and I enjoyed catching up with friends and colleagues involved in the Hadoop project and also meeting some of the executives of the Apache Software Foundation in person. It is clear that the Apache community is thriving and that interest in Hadoop remains very high.

Hortonworks is committed to supporting Apache and we are pleased to have been a gold sponsor of this event. I delivered the day two keynote at ApacheCon on the success of Apache Hadoop. To view my presentation please visit Slideshare.net.

~E14
@jeric14, @hortonworks 

Read More

Apache Hadoop Meets Informatica Data Parsing

As the framework architects and developers of Apache Hadoop MapReduce, we are always looking for ways to simplify the complex tasks associated with large-scale processing of data. We want users and organizations to spend their time on analyzing their growing data to gain valuable insights, not on menial tasks such as massaging their data for consumption or tediously parsing complex structures in their data. The Informatica HParser technology is extremely valuable in this regard.

For those new to Apache Hadoop, MapReduce is a parallel computing framework for processing large volumes of data. It deals with the four V’s of big data (as Forrester described) that present challenges to existing data systems, namely: volume, velocity, variety and variability. Together with the Hadoop Distributed File System (HDFS) and a handful of other important Apache Hadoop projects, it provides a massively scalable and highly reliable platform for storing, processing, managing and ultimately analyzing the ever-increasing data coming not only from transactional systems but also unstructured data in the form of server logs, customer interaction records, social media updates, email, PDFs, CDRs and so forth.…

Read More

Delivering on our Promises

Back in late June when Hortonworks was officially announced at Hadoop Summit, we explained that our strategy was going to focus on accelerating the development and adoption of Apache Hadoop. We made bold statements about the opportunities that Apache Hadoop had to become the de facto platform for big data. We even predicted that half of the world’s data would be processed by Apache Hadoop within five years.

We also talked about how in order for all of that to happen, we needed to address the technical and knowledge gaps that exist. We needed to heavily invest in engineering to make Hadoop easier to install, manage and use for enterprises and more open and extensible for a growing ecosystem of technology and service providers.

Today we are making a series of announcements that are an important first step in delivering on these promises:

Read More

Go to page:« First...10...1314151617...Last »