The Hortonworks Blog

Posts categorized by : Tez

This is the third post in a series exploring recent innovations in the Hadoop ecosystem that are included in Hortonworks Data Platform (HDP) 2.2. In this post, we introduce the theme of supporting rolling upgrades and downgrades of a Hadoop YARN cluster.

HDP 2.2 offers substantial innovations in Apache™ Hadoop YARN, enabling Hadoop users to efficiently store and interact with their data in a single repository, simultaneously using a wide variety of engines.…

As a data scientist working with Hadoop, I often use Apache Hive to explore data, make ad-hoc queries or build data pipelines.

Until recently, optimizing Hive queries focused mostly on data layout techniques such as partitioning and bucketing or using custom file formats.

In the last couple of years, driven largely by the innovation of the Hive community around the Stinger initiative, Hive query time has improved dramatically, enabling Hive to support both batch and interactive workloads at speed and at scale.…

This is a unique moment in time. Fueled by open source, Apache Hadoop has become an essential part of the modern enterprise data architecture and the Hadoop market is accelerating at an amazing rate.

The impressive thing about successful open source projects is the pace of the “release early, release often” development cycle, also known as upstream innovation. The process moves through major and minor releases at a regular clip and the downstream users get to pick the releases and versions they want to consume for their specific needs.…

This is the second post in a series that explores recent innovations in the Hadoop ecosystem that are included in HDP 2.2. In this post, we introduce the theme of running service-workloads in YARN to set context for deeper discussion in subsequent blogs.

HDP 2.2 brings substantial innovations in Apache Hadoop YARN, enabling users of Apache Hadoop to efficiently store their data in a single repository and interact with it simultaneously using a wide variety of engines.…

Hortonworks Data Platform (HDP) provides Hadoop for the Enterprise, with a centralized architecture of core enterprise services, for any application and any data. HDP is uniquely built around native YARN services to enable a centralized architecture through which multiple data access applications interact with a shared data set. Apache Hive is one of the most important of those data access applications—the defacto standard for interactive SQL queries over petabytes of data in Hadoop.…

This is the first post in a series that explores recent innovations in the Hadoop ecosystem that are included in HDP 2.2. In this post, we introduce themes to set context for deeper discussion in subsequent blogs.

HDP 2.2 represents another major step forward for Enterprise Hadoop. With thousands of enhancements across all elements of the platform spanning data access to security to governance, rolling upgrades and more, HDP 2.2 makes it even easier for our customers to incorporate HDP as a core component of Modern Data Architecture (MDA).…

We take pride in producing valuable technical blogs and sharing it with a wider audience. Of all the blogs published in 2014 on our website, the following were most popular:

  • Improving Spark for Data Pipelines with Native YARN Integration.

    Gopal Vijayaraghavan and Oleg Zhurakousky demonstrate improved Apache Spark, which with the help of the pluggable execution context.

  • HDP 2.2 A Major Step Forward for Enterprise Hadoop

    Tim Hall outlines six months of innovation and new features across Apache Hadoop and its related projects.

  • It gives me great pleasure to announce that the Apache Hadoop community has released Apache Hadoop 2.6.0 !

    In particular, we are excited about three major pieces in this release: heterogeneous storage in HDFS with SSD & Memory tiers, support for long-running services in YARN and rolling upgrades—the ability to upgrade your cluster software and restart upgraded nodes without taking the cluster down or losing work in progress. With YARN as its architectural center, Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it simultaneously in different ways.…

    With YARN as its architectural center, Apache Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it simultaneously in different ways. Apache Tez supports YARN-based, high performance batch and interactive data processing applications in Hadoop that need to handle datasets scaling to terabytes or petabytes.

    The Apache community just released Apache Pig 0.14.0,and the main feature is Pig on Tez.…

    Last week Hortonworks presented the second of our eight Discover HDP 2.2 webinars. Alan Gates and Raj Bains discussed the Stinger.next initiative and new Apache Hive features for speed, scale and SQL that are included in Hortonworks Data Platform 2.2.

    After an overview of HDP 2.2, Alan discussed what the Apache community accomplished with the original Stinger initiative and how that momentum continues in Stinger.next.

    Alan and Raj then discussed details on three areas of innovation currently underway in the Apache Hive project:

    • For SQL – transaction with ACID semantics
    • For Speed – the cost based optimizer
    • For Scale – dynamic query optimization

    Here is the complete recording of the webinar

    Here is the presentation deck.…

    Arsalan Tavakoli-Shiraji, customer engagement lead overseeing business development activities at Databricks, is our guest blogger today. In this blog, he discusses our expanded partnership built around Apache Spark on Apache Hadoop in three areas: customers, engineering, and open source.

    Today Databricks and Hortonworks are announcing an expanded partnership built around Apache Spark; allow me to explain why we’re thrilled to be embarking on this journey with them.

    When we started Databricks last summer, Apache Spark was in the early stages of enterprise adoption.…

    A few weeks back, we outlined a broad initiative to invest in Spark in the context of the Hadoop ecosystem. We intend to facilitate a more efficient utilization of Hadoop cluster resources for ETL and/or Data Pipeline workloads when using Spark. Many of the lessons learned while building out MapReduce, Apache Tez and other YARN data-processing frameworks can be applied to the Spark project in order to optimize its resource utilization and to make it a good multi-tenant citizen within a YARN-based Hadoop cluster.…

    We recently hosted a Spark webinar as part of the YARN Ready series, aimed at a technical audience including developers of applications for Apache Hadoop and Apache Hadoop YARN. During the event, a number of good questions surfaced that we wanted to share with our broader audience in this blog. Take a look at the video and slides along with these questions and answers below.

    You can listen to the entire webinar recording here.…

    Hortonworks Data Platform Version 2.2 represents yet another major step forward for Hadoop as the foundation of a Modern Data Architecture. This release incorporates the last six months of innovation and includes more than a hundred new features and closes thousands of issues across Apache Hadoop and its related projects.

    Our approach at Hortonworks is to enable a Modern Data Architecture with YARN as the architectural center, supported by key capabilities required of an enterprise data platform — spanning Governance, Security and Operations.…

    Apache Tez has been selected as a winner for 2014’s InfoWorld Bossie award. The “Bossies” identify the Best of Open Source software every year and are awarded by a panel of InfoWorld Test Center editors and industry expert reviewers. The Bossie awards celebrate game-changing open source software projects in different domains, and Apache Tez was selected in the Big Data Tools category.

    Last year, Apache Hadoop with YARN as its architectural center was awarded a Bossie.…