The Hortonworks Blog

Posts categorized by : Tez

This is the first post in a series that explores recent innovations in the Hadoop ecosystem that are included in HDP 2.2. In this post, we introduce themes to set context for deeper discussion in subsequent blogs.

HDP 2.2 represents another major step forward for Enterprise Hadoop. With thousands of enhancements across all elements of the platform spanning data access to security to governance, rolling upgrades and more, HDP 2.2 makes it even easier for our customers to incorporate HDP as a core component of Modern Data Architecture (MDA).…

We take pride in producing valuable technical blogs and sharing it with a wider audience. Of all the blogs published in 2014 on our website, the following were most popular:

  • Improving Spark for Data Pipelines with Native YARN Integration.

    Gopal Vijayaraghavan and Oleg Zhurakousky demonstrate improved Apache Spark, which with the help of the pluggable execution context.

  • HDP 2.2 A Major Step Forward for Enterprise Hadoop

    Tim Hall outlines six months of innovation and new features across Apache Hadoop and its related projects.

  • It gives me great pleasure to announce that the Apache Hadoop community has released Apache Hadoop 2.6.0 !

    In particular, we are excited about three major pieces in this release: heterogeneous storage in HDFS with SSD & Memory tiers, support for long-running services in YARN and rolling upgrades—the ability to upgrade your cluster software and restart upgraded nodes without taking the cluster down or losing work in progress. With YARN as its architectural center, Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it simultaneously in different ways.…

    With YARN as its architectural center, Apache Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it simultaneously in different ways. Apache Tez supports YARN-based, high performance batch and interactive data processing applications in Hadoop that need to handle datasets scaling to terabytes or petabytes.

    The Apache community just released Apache Pig 0.14.0,and the main feature is Pig on Tez.…

    Last week Hortonworks presented the second of our eight Discover HDP 2.2 webinars. Alan Gates and Raj Bains discussed the Stinger.next initiative and new Apache Hive features for speed, scale and SQL that are included in Hortonworks Data Platform 2.2.

    After an overview of HDP 2.2, Alan discussed what the Apache community accomplished with the original Stinger initiative and how that momentum continues in Stinger.next.

    Alan and Raj then discussed details on three areas of innovation currently underway in the Apache Hive project:

    • For SQL – transaction with ACID semantics
    • For Speed – the cost based optimizer
    • For Scale – dynamic query optimization

    Here is the complete recording of the webinar

    Here is the presentation deck.…

    Arsalan Tavakoli-Shiraji, customer engagement lead overseeing business development activities at Databricks, is our guest blogger today. In this blog, he discusses our expanded partnership built around Apache Spark on Apache Hadoop in three areas: customers, engineering, and open source.

    Today Databricks and Hortonworks are announcing an expanded partnership built around Apache Spark; allow me to explain why we’re thrilled to be embarking on this journey with them.

    When we started Databricks last summer, Apache Spark was in the early stages of enterprise adoption.…

    A few weeks back, we outlined a broad initiative to invest in Spark in the context of the Hadoop ecosystem. We intend to facilitate a more efficient utilization of Hadoop cluster resources for ETL and/or Data Pipeline workloads when using Spark. Many of the lessons learned while building out MapReduce, Apache Tez and other YARN data-processing frameworks can be applied to the Spark project in order to optimize its resource utilization and to make it a good multi-tenant citizen within a YARN-based Hadoop cluster.…

    We recently hosted a Spark webinar as part of the YARN Ready series, aimed at a technical audience including developers of applications for Apache Hadoop and Apache Hadoop YARN. During the event, a number of good questions surfaced that we wanted to share with our broader audience in this blog. Take a look at the video and slides along with these questions and answers below.

    You can listen to the entire webinar recording here.…

    Hortonworks Data Platform Version 2.2 represents yet another major step forward for Hadoop as the foundation of a Modern Data Architecture. This release incorporates the last six months of innovation and includes more than a hundred new features and closes thousands of issues across Apache Hadoop and its related projects.

    Our approach at Hortonworks is to enable a Modern Data Architecture with YARN as the architectural center, supported by key capabilities required of an enterprise data platform — spanning Governance, Security and Operations.…

    Apache Tez has been selected as a winner for 2014’s InfoWorld Bossie award. The “Bossies” identify the Best of Open Source software every year and are awarded by a panel of InfoWorld Test Center editors and industry expert reviewers. The Bossie awards celebrate game-changing open source software projects in different domains, and Apache Tez was selected in the Big Data Tools category.

    Last year, Apache Hadoop with YARN as its architectural center was awarded a Bossie.…

    Concurrent Inc. is a Hortonworks Technology Partner and recently announced that Cascading 3.0 now supports Apache Tez as an application runtime. Cascading is a powerful development framework for building enterprise data applications on Hadoop and is one of the most widely deployed technologies for data applications, with more than 175,000 user downloads a month. Used by thousands of businesses including eBay, Etsy, The Climate Corp and Twitter, Cascading is the de facto standard in data application development on Hadoop.…

    The Apache Tez community is thrilled to announce the release of version 0.5 of the project. We’re referring to this as “the developer release” because it’s all about developers. The community focused on meeting the key needs of developers using Tez to create their applications and engines. Tez 0.5 includes clean and intuitive developer APIs, easy debugging, extensive documentation and deployment with rolling upgrades.

    Apache Hadoop YARN paved the way for Apache Tez.…

    Speed, Scale, and SQL Semantics

    Since its inception and graduation as a Top Level Project (TPL) from Apache Foundation Project (ASF) in September 2010, Apache Hive has been steadily improving—in speed, scale, and SQL semantics—to meet enterprise requirements for both interactive and batch queries at Hadoop scale.

    It has become a defacto standard for SQL queries over petabytes of data stored in Hadoop. It is a compliant SQL engine that offers familiarity to developers over a comprehensive and familiar set of SQL semantics for Apache Hadoop.…

    This week we continue our YARN webinar series with detailed introduction and a developer overview of Apache Tez.  Designed to express fit-to-purpose data processing logic, Tez enables batch and interactive data processing applications spanning TB to PB scale datasets.  Tez offers a customizable execution architecture that allows developers to express complex computations as dataflow graphs and allows for dynamic performance optimizations based on real information about the data and the resources required to process it.…

    Last week, Apache Tez graduated to become a top level project within the Apache Software Foundation (ASF). This represents a major step forward for the project and is representative of its momentum that has been built by a broad community of developers from not only Hortonworks but Cloudera, Facebook, LinkedIn, Microsoft, NASA JPL, Twitter, and Yahoo as well.

    What is Apache Tez and why is it useful?

    Apache™ Tez is an extensible framework for building YARN based, high performance batch and interactive data processing applications in Hadoop that need to handle TB to PB scale datasets.…

    Go to page:123