The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

Hive / HCatalog Forum

Tez: MR or non-MR

  • #25507
    Wei Tan

    Hi, I’ve been looking at the features of Tez.
    1. From this blog, , it seems that Tez is a non-MR framework enabling the execution of a DAG in ONE job. This is not feasible in the MapReduce framework since a MR job can only consist of two steps, i.e., map and reduce. So you cannot do map-map-reduce or map-reduce-map-reduce, in a single job.

    2. However, when I look at the manual of Tex here: , it says: “The Tez AMPoolService or Tez Service is a service that launches and makes available a pool of pre-launched MapReduce AMs ( Tez AMs ). These AMs in the pool can, in turn, be configured to pre-allocate a number of containers to allow jobs to be launched and completed faster. To use the Tez Service, the clients must submit the jobs to this service instead of the ResourceManager.”

    It seems that Tez is still conceptually under MR framework. Performance is improved compared to out-of-box MR framework by (1) pre-launching AM for MapReduce jobs (2)container reuse for MR tasks.

    So which understanding is true, 1 or 2?
    Thanks for the clarification in advance!

  • Author
  • #26104
    Wei Tan

    Could someone kindly reply to my question?:) thanks


    I feel both points are true. HDP2.0/Tez0.1 mainly provides the pre-launching of MR-AMs and containers (if configured) as well as container resuse. This is mainly (2) what you are mentioning. (1) is feel is the near term goal to make it an independent framework.

    Just my understanding.

The forum ‘Hive / HCatalog’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.