The Hortonworks Blog

More from Bikas Saha

Apache Tez has been selected as a winner for 2014’s InfoWorld Bossie award. The “Bossies” identify the Best of Open Source software every year and are awarded by a panel of InfoWorld Test Center editors and industry expert reviewers. The Bossie awards celebrate game-changing open source software projects in different domains, and Apache Tez was selected in the Big Data Tools category.

Last year, Apache Hadoop with YARN as its architectural center was awarded a Bossie.…

The Apache Tez community is thrilled to announce the release of version 0.5 of the project. We’re referring to this as “the developer release” because it’s all about developers. The community focused on meeting the key needs of developers using Tez to create their applications and engines. Tez 0.5 includes clean and intuitive developer APIs, easy debugging, extensive documentation and deployment with rolling upgrades.

Apache Hadoop YARN paved the way for Apache Tez.…

Last week, Apache Tez graduated to become a top level project within the Apache Software Foundation (ASF). This represents a major step forward for the project and is representative of its momentum that has been built by a broad community of developers from not only Hortonworks but Cloudera, Facebook, LinkedIn, Microsoft, NASA JPL, Twitter, and Yahoo as well.

What is Apache Tez and why is it useful?

Apache™ Tez is an extensible framework for building YARN based, high performance batch and interactive data processing applications in Hadoop that need to handle TB to PB scale datasets.…

We are excited to announce that the Apache™ Tez community voted to release version 0.4 of the software.

Apache Tez is an alternative to MapReduce that provides a powerful framework for executing a complex topology of tasks for data access in Hadoop. Version 0.4 incorporates the feedback from extensive testing of Tez 0.3, released just last month.

This release is especially meaningful because it coincides with completion of the Stinger Initiative (a collaborative community effort involving 145 developers across 44 companies) and the upcoming release of Apache Hive 0.13.…

The Apache Tez community has voted to release 0.3 of the software.

Apache™ Tez is a replacement of MapReduce that provides a powerful framework for executing a complex topology of tasks. Tez 0.3.0 is an important release towards making the software ready for wider adoption by focussing on fundamentals and ironing out several key functions. The major action areas in this release were

  • Security. Apache Tez now works on secure Hadoop 2.x clusters using the built-in security mechanisms of the Hadoop ecosystem.
  • This post is the fifth in our series on the motivations, architecture and performance gains of Apache Tez for data processing in Hadoop. The series has the following posts:

    Case Study: Automatic Reduce Parallelism Motivation

    Distributed data processing is dynamic by nature and it is extremely difficult to statically determine optimal concurrency and data movement methods a priori.…

    This post is the second in our series on the motivations, architecture and performance gains of Apache Tez for data processing in Hadoop. The series has the following posts:

    Overview

    Apache Tez models data processing as a dataflow graph, with the vertices in the graph representing processing of data and edges representing movement of data between the processing.…

    This post is the first in our series on the motivations, architecture and performance gains of Apache Tez for data processing in Hadoop. The series has the following posts:

    In this post we introduce the motivation behind Apache Tez (http://incubator.apache.org/projects/tez.html) and provide some background around the basic design principles for the project.…

    YARN is part of the next generation Hadoop cluster compute environment. It creates a generic and flexible resource management framework to administer the compute resources in a Hadoop cluster. The YARN application framework allows multiple applications to negotiate resources for themselves and perform their application specific computations on a shared cluster. Thus, resource allocation lies at the heart of YARN.

    YARN ultimately opens up Hadoop to additional compute frameworks, like Tez, so that an application can optimize compute for their specific requirements.…