Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics, offering information and knowledge of the Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
September 17, 2014
prev slideNext slide

Introducing Apache Tez 0.5: The Developer Release

Screen Shot 2014-09-16 at 3.33.17 PMThe Apache Tez community is thrilled to announce the release of version 0.5 of the project. We’re referring to this as “the developer release” because it’s all about developers. The community focused on meeting the key needs of developers using Tez to create their applications and engines. Tez 0.5 includes clean and intuitive developer APIs, easy debugging, extensive documentation and deployment with rolling upgrades.

Apache Hadoop YARN paved the way for Apache Tez. With Hadoop 2, Tez has proven itself rock-solid stable for users of Apache Hive with Tez and Apache Pig with Tez. This release extends the benefits of Tez to many more developers that aim to take advantage of its reliability, scale, and performance within their engineering projects.

Now developers can take full advantage of the Hadoop platform with YARN as its architectural center. YARN enables purpose-built applications to run within a shared execution environment, and Tez enables developers to write purpose-built applications for the data processing domain.

Stable Developer APIs

Applications like Apache Hive, Apache Pig and Cascading use Tez’s core directed acyclic graph (DAG) APIs for a variety of batch and interactive use cases. The resulting wealth of feedback from users of these applications and community members involved in those projects has been incorporated into the Tez code.

Our testing and real-world experiences show that the core APIs are stable and should stand up to the challenge of even more widespread adoption. The Tez community plans to maintain backwards compatibility for these APIs, so developers, vendors and ISVs can continue to confidently build their applications with Tez.

Debugging: Local Execution & Performance Swim Lanes

To develop code without the benefit of tools for easy debugging is challenging. In this release, we provide capabilities for debugging both application code and performance:

  • Local Execution – Tez adds support for local DAG execution that allows the entire execution to run inline within the same JVM. This allows the developer to attach a debugger to the DAG execution and use existing tools to debug every aspect of the execution. There is almost no difference between the executions in local and non-local modes. The local execution is essentially WYSIWYG.
  • Performance Debugging – The proverbial pain-point for most developers of distributed systems is how to diagnose poor performance. We are excited to add a swim lane tool that enables post-facto visualization of DAG execution, which can be used to drill down into possible performance problems.

tez_2

Documentation with Javadocs

The community has worked hard to write extensive javadocs for all the APIs that are exposed by Tez. We also clarified the naming and packaging of the APIs to make them more intuitive.

Because code samples are the best form of documentation, we include a number of examples to showcase how to build applications using the Tez APIs. We wrote the examples from the point of view of a developer using Tez, in order to guide the reader on a path from basic to more complex use cases.

Deployment with Rolling Upgrades

Apache Tez has always been easy to deploy. It is a client-side YARN application, which means that there is nothing to install in the cluster. This is important from a usability standpoint but also from a safety point of view. It’s perfectly safe to try out Tez on any cluster (even your production cluster) because it is not going to change anything or leave behind any traces.

We have improved the packaging to be able to support rolling upgrades of a Hadoop cluster. Rolling upgrades will soon be released for Apache Hadoop, allowing cluster administrators to upgrade a Hadoop cluster without any downtime. Tez, as a leading example of an engine running within the YARN framework, is ready to work with the latest and greatest possibilities of Hadoop.

We thank our users, developers and contributors for helping us strengthen Tez during the early days while the developer tools matured. Prior releases proved Tez’ performance and scalability. We are confident that with this release, Tez is a stable and rock solid framework for developers of big data applications. Now is the time for independent software vendors (ISVs) and developers to take full advantage of DAG-driven capabilities in Tez for their purpose-built applications on YARN.

— The Tez team

Download

Tags:

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>