The Hortonworks Blog

This post from Vinod Kumar Vavilapalli of Hortonworks and  Chris Douglas and Carlo Curino of Microsoft Research.

Great news from the Apache Hadoop YARN community! A paper describing Apache Hadoop YARN was accepted at 2013 ACM Symposium on Cloud Computing (SoCC 2013), where it won the award for best paper! Here’s the title and abstract:

Title

Apache Hadoop YARN: Yet Another Resource Negotiator [Industrial Paper]

Abstract

The initial design of Apache Hadoop was tightly focused on running massive, MapReduce jobs to process a web crawl.…

We’re continuing our series of quick interviews with Apache Hadoop project committers at Hortonworks.

This week Enis Soztutar discusses Apache HBase, built for random read/write access to data in billions of rows and millions of columns.

Enis began using Apache Hadoop in 2006. Now, Enis is a Hortonworks engineer and Apache HBase project management chair. He has also been a committer to Apache Hadoop since 2007 and to HBase since 2012.…

This post is the fourth in our series on the motivations, architecture and performance gains of Apache Tez for data processing in Hadoop. The series has the following posts:

The previous couple of blogs covered Tez concepts and APIs.…

This is a guest blog post from our partner, Actuate. They’ve been generous enough to create some great Hadoop tutorials on the Open Source BIRT project that use the Hortonworks Sandbox.

By now, Apache™ Hadoop® has become synonymous with the first stage of Big Data: storing, processing and managing huge volumes and varieties of structured and unstructured data. Yet the data stored by Hadoop remains unreadable to the average business user.…

A crucial requirement of any enterprise technology is to ensure simplest possible management and operation. We think that simplicity means two things: 1) integration with existing infrastructure and tools and 2) leveraging existing knowledge and skills.

Download the beta release of Ambari SCOM Management Pack here.

Ambari (http://incubator.apache.org/ambari/) was introduced as an Apache incubator project with the aim of developing the best management tool for Hadoop applying our principles of open source community development for rapid innovation and solving the right problems for enterprises.…

Thanks to all those who joined in person and virtually for the Apache Ambari Meetup at Hortonworks this week. We talked tech, we saw demos, we laughed, we cried, we ate pizza.

The central theme of the night was the newly added support for Hadoop 2. Ambari now has:

  • Hadoop 2 Stack: Ambari adds support for installing, managing and monitoring a Hadoop 2 Stack.
  • NameNode HA: Configure NameNode High Availability based on QJM support built-into HDFS2
  • YARN: Ambari manages YARN Service lifecycle and automatically deploys the MapReduce2 framework.

Personally, I’ve followed the Go Programming Language (golang) with increasing interest for a while and have been itching to really sink my teeth into it. I’ve always felt you never learn any programming language for real unless it’s used to build a fairly large, real-world solution. It’s the only way to gain tackle real issues and gain some confidence for future battles with destiny… FTR, my first real project in Java was Hadoop, circa 2006.…

A lot of people ask me: how do I become a data scientist? I think the short answer is: as with any technical role, it isn’t necessarily easy or quick, but if you’re smart, committed and willing to invest in learning and experimentation, then of course you can do it.

In a previous post, I described my view on “What is a data scientist?”: it’s a hybrid role that combines the “applied scientist” with the “data engineer”. …

We’re continuing our series of quick interviews with Apache Hadoop project committers at Hortonworks.

This week Alan Gates, Hortonworks Co-Founder and Apache Pig Committer, discusses using Apache Pig for efficiently managing MapReduce workloads. Pig is ideal for transforming data in Hadoop: joining it, grouping it, sorting it and filtering it.

Alan explains how Pig takes scripts written in a language called Pig Latin and translates those into MapReduce jobs.

Listen to Alan describe the future of Pig in Hadoop 2.0.…

This post is the third in our series on the motivations, architecture and performance gains of Apache Tez for data processing in Hadoop. The series has the following posts:

Apache Tez models data processing as a dataflow graph, with the vertices in the graph representing processing of data and edges representing movement of data between the processing.…

‘The world is being digitized’ proclaimed Geoffrey Moore in his keynote at Hadoop Summit 2012 over a year ago. His belief is that we are moving away from an analog society where we collect only casual recording of events to one that is digital, where everything is captured. It is our belief that Hadoop is one of the key technologies powering this shift to a digital society.

There is almost an expectation that we capture the pics, vids and conversations that run before us. …

Are you a Hadoop hot shot?  Are you the one everyone looks to for help on their Hadoop projects? Are you looking to showcase your talent to the world?

Then just maybe we have a great option for you. We recently published the Hortonworks Sandbox tutorials on GitHub. Now it’s your turn. We invite you to add your own Hadoop tutorials or to improve on the ones that we’ve published.…

YARN and the Hortonworks Data Platform 2.0 enables one Hadoop cluster to share data and analytical processing capabilities across the Enterprise organization. Organizations can use the Hortonworks Data Platform 2.0 to:

  • Pool all enterprise data into one scalable and reliable storage platform
  • Enable all analytical processing IN the data platform
  • Provide access to this data and processing across all business units

The Capacity Scheduler (CS) ensures that groups of users and applications will get a guaranteed share of the cluster, while maximizing overall utilization of the cluster.…

There’s an old proverb you’ve likely heard about blind men trying to identify an elephant. Depending on the version of the proverb you’ve heard the elephant is misidentified variously as rope, walls, pillars, baskets, brushes and more. Oddly, no-one identified it as a next-generation enterprise data platform but I guess it is an old proverb.

The Hadoop elephant is a platform though, and as such the proverb holds true. Depending on your perspective, it has different capabilities, components and integration points to meet your requirements.…

We’ve been hosting a series of webinars focusing on how to make Apache Hadoop a viable enterprise platform that powers modern data architectures.

Implementing modern data architecture with Hadoop means that it must deeply integrate with existing technologies, leverage existing skills and investments and provide key services. This guest post from David Smith, Vice President of Marketing and Community at Revolution Analytics, shares his perspective on the role of a Data Scientists in a Big Data world.…

Go to page:« First...910111213...2030...Last »

Thank you for subscribing!