Hive / HCatalog Forum

Tez: MR or non-MR

  • #25507
    Wei Tan
    Member

    Hi, I’ve been looking at the features of Tez.
    1. From this blog, http://hortonworks.com/blog/introducing-tez-faster-hadoop-processing/ , it seems that Tez is a non-MR framework enabling the execution of a DAG in ONE job. This is not feasible in the MapReduce framework since a MR job can only consist of two steps, i.e., map and reduce. So you cannot do map-map-reduce or map-reduce-map-reduce, in a single job.

    2. However, when I look at the manual of Tex here: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/bk_installing_manually_book/content/rpm-chap-tez.html , it says: “The Tez AMPoolService or Tez Service is a service that launches and makes available a pool of pre-launched MapReduce AMs ( Tez AMs ). These AMs in the pool can, in turn, be configured to pre-allocate a number of containers to allow jobs to be launched and completed faster. To use the Tez Service, the clients must submit the jobs to this service instead of the ResourceManager.”

    It seems that Tez is still conceptually under MR framework. Performance is improved compared to out-of-box MR framework by (1) pre-launching AM for MapReduce jobs (2)container reuse for MR tasks.

    So which understanding is true, 1 or 2?
    Thanks for the clarification in advance!
    Wei

to create new topics or reply. | New User Registration

  • Author
    Replies
  • #26104
    Wei Tan
    Member

    Could someone kindly reply to my question?:) thanks

    #27829

    I feel both points are true. HDP2.0/Tez0.1 mainly provides the pre-launching of MR-AMs and containers (if configured) as well as container resuse. This is mainly (2) what you are mentioning. (1) is feel is the near term goal to make it an independent framework.

    Just my understanding.

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.