Apache Tez

A Framework for YARN-based, Data Processing Applications In Hadoop

Apache™ Tez is an extensible framework for building YARN based, high performance batch and interactive data processing applications in Hadoop that need to handle TB to PB scale datasets. It allows projects in the Hadoop ecosystem, such as Apache Hive and Apache Pig, as well as 3rd-party software vendors to express fit-to-purpose data processing applications in a way that meets their unique demands for fast response times and extreme throughput at petabyte scale.

What Tez Does

Apache Tez  provides a developer API and framework to write native YARN applications that bridge the spectrum of interactive and batch workloads. It allows applications to seamlessly span the scalability dimension from GB’s to PB’s of data and 10’s to 1000’s of nodes. The Apache Tez component library allows developers to use Tez to create Hadoop applications that integrate with YARN and perform well within mixed workload Hadoop clusters.

And, since Tez is extensible and embeddable, it provides the fit-to-purpose freedom to express highly optimized data processing applications, giving them an advantage over general-purpose, end-user-facing engines such as MapReduce and Spark. Finally, it offers a customizable execution architecture that allows you to express complex computations as dataflow graphs and allows for dynamic performance optimizations based on real information about the data and the resources required to process it.


Hive with Tez

As the defacto standard for SQL-In-Hadoop, Apache Hive has been optimized to serve both batch and interactive queries at petabyte scale.  As of the 0.13 release Hive now embeds Tez so that it can translate complex SQL statements into highly optimized, purpose-built data processing graphs that strike the right balance between performance, throughput, and scalability across a wide range of use cases and data set sizes.  This advance was a key driver of the Stinger Initiative, a broad community effort that included contributions from 145 engineers across 44 different organizations.  Tez helps make Hive interactive.

Tez and an Open Community

Originally developed by Hortonworks, the Apache Tez project  entered the Apache Incubator in February 2013 and then graduated to a top level project in July 2014. In just a short time, Tez has gathered 31 committers which represent a who’s who of  leading Hadoop companies, including Cloudera, Facebook, LinkedIn, Microsoft, NASA JPL, Twitter, and Yahoo. The substantial contribution from this open community has propelled Tez to become a cornerstone of core Apache projects like Apache Hive and Apache Pig and to be embraced  by other important open-source projects like Cascading. There is much more to come.

How Tez Works

The motivations, architecture and performance gains of Apache Tez for data processing in Hadoop extend well beyond Hive and Pig and the project has set the standard for true integration with YARN for interactive workloads.  We invite you to learn more about Tez with these following links:

Try these Tutorials

Apache Top-Level Project Since
July 2014
Hortonworks Committers
Project Page

Try Tez with Sandbox

Hortonworks Sandbox is a self-contained virtual machine with HDP running alongside a set of hands-on, step-by-step Hadoop tutorials.

Get Sandbox

Join the Webinar!

Big Data Virtual Meetup Chennai
Wednesday, October 29, 2014
9:00 pm India Time / 8:30 am Pacific Time / 4:30 pm Europe Time (Paris)

More Webinars »

More posts on:
HDP 2.1 Webinar Series
Join us for a series of talks on some of the new enterprise functionality available in HDP 2.1 including data governance, security, operations and data access :
Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.
Integrate with existing systems
Hortonworks maintains and works with an extensive partner ecosystem from broad enterprise platform vendors to specialized solutions and systems integrators.