Hadoop Ecosystem

Industry news, partner stories, buzz and happenings

Hortonworks Summer Internship 2012

As a first time intern, I can undoubtedly say that Hortonworks was the perfect place for me to gain real world work experience and have the chance to team up with many incredibly talented, driven people. Of course, I didn’t get to fully interact with everyone in the company in the three months that I was here but even after such a short time it is clear to me that it is the welcoming atmosphere and the determined team here that have allowed Hortonworks to achieve so many goals in just over a year.…

During the ‘Future of Apache Hadoop’ webinar series, Hortonworks founders and core committers will discuss the future of Hadoop and related projects including Apache Pig, Apache Ambari, Apache Zookeeper and Apache Hadoop YARN.

Apache Hadoop has rapidly evolved to become the leading platform for managing, processing and analyzing big data. Consequently there is a thirst for knowledge on the future direction for Hadoop related projects. The Hortonworks webinar series will feature core committers of the Apache projects discussing the essential components required in a Hadoop Platform, current advances in Apache Hadoop, relevant use-cases and best practices on how to get started with the open source platform.…

This week, I spent some time and enjoyed speaking at the Softgrid 2012 conference in San Francisco. It was a great collection of speakers and attendees and opened my eyes to some Hadoop driven possibilities that not only differentiate utilities companies but will also transform our day-to-day lives.

The conference focused on software (in this case intelligent analytics) as a competitive advantage to enable value and growth for utilities.  These often large and historically conservative organizations have moved beyond the notion that their sole business is to distribute electric power efficiently, reliably, and cost-effectively to consumers.…

Do you want to understand how Apache Hadoop can benefit your business? Do you understand the relationship between Hadoop and your Big Data initiatives? Are you struggling to explain the benefits of Hadoop to your management team?

At Hortonworks, we are constantly being asked by business and executive audiences to explain use cases, benefits and components of Hadoop. While the interest in Big Data and Hadoop grows, this urgent and often pressing demand for a map to create value and differentiation amplifies.…

As organizations continue to ramp the number of MapReduce jobs processed in their Hadoop clusters, we often get questions about how best to share clusters. I wanted to take the opportunity to explain the role of Capacity Scheduler, including covering a few common use cases.

Let me start by stating the underlying challenge that led to the development of Capacity Scheduler and similar approaches.

As organizations become more savvy with Apache Hadoop MapReduce and as their deployments mature, there is a significant pull towards consolidation of Hadoop clusters into a small number of decently sized, shared clusters.…

I wanted to take this opportunity to say thanks to the more than 2,200 attendees, speakers and sponsors that helped to make Hadoop Summit 2012 a great success. There was tremendous buzz throughout the conference; exceeding the excitement levels of all past Hadoop conferences. It’s a great indicator for the future of Apache Hadoop and the broader big data ecosystem.

The content from this conference was outstanding, from the opening keynotes to the last round of breakout sessions.…

What’s possible with all this data?

Data Integration is a key component of the Hadoop solution architecture. It is the first obstacle encountered once your cluster is up and running. Ok, I have a cluster… now what? Do I write a script to move the data? What is the language? Isn’t this just ETL with HDFS as another target?Well, yes…

Sure you can write custom scripts to perform a load, but that is hardly repeatable and not viable in the long term.…

In Shaun Connolly’s post about balancing community innovation and enterprise stability, he discussed the importance of an open source project forging ahead with big improvements that are expected to be initially buggy and incomplete functionally but then stabilize over time. In the case of Apache Hadoop 2.0, currently in community Alpha release, the big improvements have been underway for the past 3 years and include such things as:

  • Next-gen MapReduce (aka YARN) that opens up Hadoop’s job processing architecture to other application workloads beyond MapReduce,
  • New HDFS pipe-line to support append and flush,
  • HDFS federation and performance improvements that enable Hadoop to scale to 1000’s more nodes in a cluster, and
  • High availability improvements that address some of the single point of failure issues that are often used as examples of how Hadoop may not be as enterprise-ready as it could be.…
  • The fifth annual Hadoop Summit drew to a close last week, with over 2200 Hadoopniks in attendance. While there were many innovations demonstrated, for me the best action was about Pig, HCatalog and Hive from Hortonworks and Twitter.

    At the Hadoop Summit Pig Meetup, Twitter announced Ambrose, which now includes an excellent graph layout of Pig EXPLAIN data. This visualization can be used to debug and better understand your Pig scripts.…

    Hadoop Summit is just around the corner and by that, I mean next week! There is still time to register for the conference but please do it soon as the conference is filling up quickly. Today is also the last day in which online registration will remain open. After today, you will need to register on-site at the conference itself.

    This year’s Hadoop Summit conference, now in its fifth year, promises to be the biggest and best yet.…

    Having worked at JBoss and Red Hat from 2004 to 2008 and SpringSource and VMware from 2008 to 2011, I’ve been focused on the world of open source software for a long while. I’ve been blessed to be able to serve enterprise customer needs with high quality open source software such as JBoss Application Server, Hibernate, Drools, Apache Web Server, Apache Tomcat, Spring … and now Apache Hadoop.

    As specific open source technologies mature and their use becomes mainstream, it becomes increasingly important to understand and communicate the balancing act that needs to happen between community innovation and enterprise stability.…

    In case you didn’t see the news today, Hadoop Summit announced record ecosystem support for this year’s conference. The original and world’s largest Apache Hadoop conference, now in its fifth year, is being sponsored this year by more than 40 traditional and open source software and services companies.

    Hortonworks and our co-host Yahoo! would like to thank the following companies for helping to make Hadoop Summit possible:

    The third installment of the Hortonworks executive video series features Arun C. Murthy, co-founder of Hortonworks and VP of Apache Hadoop for the Apache Software Foundation. In this video, Arun shares his view of the power of Apache Hadoop and provides some insight into the future direction of MapReduce, including the ability to support alternate programming paradigms.

    As part of Big Data Week, Dan Harvey of the London Hadoop User Group organised an afternoon session for the usergroup, which we were glad to sponsor, along with Canonical and Facegroup. I had the pleasure of presenting my view of the current and future status of Apache Hadoop to an audience that ranged from those curious about Hadoop to heavy users.

    Every talk of the day was excellent, from the use cases by Datasift, Mendeley and MusicMetric, to the talk by Francine Bennett of MastodonC on the CO2 footprint of different cloud computing infrastructures, including a live dashboard on the current CO2/hour of many cloud infrastructure sites.…

    We just released the second video in the Hortonworks Executive Series. This one features Matt Foley, Test and Release Engineering Manager for Hortonworks.

    In this video, Matt provides an overview of Hortonworks Data Platform (HDP), including a summary of the Apache Hadoop components included in the distribution and the testing processes involved in the release process. Matt also provides an overview of Apache Ambari, an open source project that is adding monitoring and management capabilities to Apache Hadoop.…

    Go to page:« First...1213141516