The Hortonworks Blog

More from Bikas Saha

We are excited to announce that the Apache™ Tez community voted to release version 0.4 of the software.

Apache Tez is an alternative to MapReduce that provides a powerful framework for executing a complex topology of tasks for data access in Hadoop. Version 0.4 incorporates the feedback from extensive testing of Tez 0.3, released just last month.

This release is especially meaningful because it coincides with completion of the Stinger Initiative (a collaborative community effort involving 145 developers across 44 companies) and the upcoming release of Apache Hive 0.13.…

The Apache Tez community has voted to release 0.3 of the software.

Apache™ Tez is a replacement of MapReduce that provides a powerful framework for executing a complex topology of tasks. Tez 0.3.0 is an important release towards making the software ready for wider adoption by focussing on fundamentals and ironing out several key functions. The major action areas in this release were

  • Security. Apache Tez now works on secure Hadoop 2.x clusters using the built-in security mechanisms of the Hadoop ecosystem.…
  • This post is the fifth in our series on the motivations, architecture and performance gains of Apache Tez for data processing in Hadoop. The series has the following posts:

    Case Study: Automatic Reduce Parallelism
    Motivation

    Distributed data processing is dynamic by nature and it is extremely difficult to statically determine optimal concurrency and data movement methods a priori.…

    This post is the second in our series on the motivations, architecture and performance gains of Apache Tez for data processing in Hadoop. The series has the following posts:

    Overview

    Apache Tez models data processing as a dataflow graph, with the vertices in the graph representing processing of data and edges representing movement of data between the processing.…

    This post is the first in our series on the motivations, architecture and performance gains of Apache Tez for data processing in Hadoop. The series has the following posts:

    In this post we introduce the motivation behind Apache Tez (http://incubator.apache.org/projects/tez.html) and provide some background around the basic design principles for the project.…

    YARN is part of the next generation Hadoop cluster compute environment. It creates a generic and flexible resource management framework to administer the compute resources in a Hadoop cluster. The YARN application framework allows multiple applications to negotiate resources for themselves and perform their application specific computations on a shared cluster. Thus, resource allocation lies at the heart of YARN.

    YARN ultimately opens up Hadoop to additional compute frameworks, like Tez, so that an application can optimize compute for their specific requirements.…

    Thank you for subscribing!