Tez: MR or non-MR

Tagged: , ,

This topic contains 2 replies, has 2 voices, and was last updated by  Saurabh Gupta 1 year, 7 months ago.

  • Creator
    Topic
  • #25507

    Wei Tan
    Member

    Hi, I’ve been looking at the features of Tez.
    1. From this blog, http://hortonworks.com/blog/introducing-tez-faster-hadoop-processing/ , it seems that Tez is a non-MR framework enabling the execution of a DAG in ONE job. This is not feasible in the MapReduce framework since a MR job can only consist of two steps, i.e., map and reduce. So you cannot do map-map-reduce or map-reduce-map-reduce, in a single job.

    2. However, when I look at the manual of Tex here: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/bk_installing_manually_book/content/rpm-chap-tez.html , it says: “The Tez AMPoolService or Tez Service is a service that launches and makes available a pool of pre-launched MapReduce AMs ( Tez AMs ). These AMs in the pool can, in turn, be configured to pre-allocate a number of containers to allow jobs to be launched and completed faster. To use the Tez Service, the clients must submit the jobs to this service instead of the ResourceManager.”

    It seems that Tez is still conceptually under MR framework. Performance is improved compared to out-of-box MR framework by (1) pre-launching AM for MapReduce jobs (2)container reuse for MR tasks.

    So which understanding is true, 1 or 2?
    Thanks for the clarification in advance!
    Wei

Viewing 2 replies - 1 through 2 (of 2 total)

You must be to reply to this topic. | Create Account

  • Author
    Replies
  • #27829

    I feel both points are true. HDP2.0/Tez0.1 mainly provides the pre-launching of MR-AMs and containers (if configured) as well as container resuse. This is mainly (2) what you are mentioning. (1) is feel is the near term goal to make it an independent framework.

    Just my understanding.

    Collapse
    #26104

    Wei Tan
    Member

    Could someone kindly reply to my question?:) thanks

    Collapse
Viewing 2 replies - 1 through 2 (of 2 total)
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.