The Hortonworks Blog

Posts categorized by : Tez

Oracle and Hortonworks continue to work on bringing the latest ELT and real-time transactional data streaming capabilities to the Hortonworks Data Platform (HDP).   Recently Oracle completed certification testing for HDP 2.2 for both Oracle Data Integrator and Oracle GoldenGate for Big Data, both integral parts of the Oracle Data Integration product portfolio. These releases certified on HDP 2.2 are the latest in the series of advanced Big Data updates and features that Oracle Data Integration is rolling out for customers to help take their Hadoop projects to the next level of enterprise integration.…

Sumeet Kumar Agrawal, principal product manager for Big Data Edition product at Informatica, is our guest blogger. In this blog, explains how Informatica’s Big Data Edition integrates with Tez and allow for significant performance gains.

Informatica Big Data Edition’s codeless visual development environment accelerates the ability of enterprises to take advantage of amazing innovations in big data to solve new challenges with skill sets that already exist within many organizations. Informatica natively integrates with big data platforms like Hadoop and NoSQL to enable next-generation big data solutions, including data warehouse optimization and 360 customer analytics.…

Kristen Hardwick, Vice President of Big Data Solutions at Spry, Inc is our guest blogger. In this blog, Kristen shares performance analysis during Spryinc’s evaluation of Apache Hive with Tez as a fast query engine.

In early 2014, Spry developed a solution that heavily utilized Hive for data transformations. When the project was complete, three distinct data sources were integrated through a series of HiveQL queries using Hive 0.11 on HDP 2.0.…

Historically, the strength of a platform lies in the abilities of developers to learn, try, and build against the platform APIs and capabilities. As Apache Hadoop matures as a platform, it’s the creativity and efforts of the developer community that is driving the innovation that makes Hadoop a vibrant and impactful foundation of a modern data architecture.

A successful developer community leads to a successful platform, and at Hortonworks we are committed to reducing the friction to speed up the success of our customers.…

Apache Ambari 2.0 User Views introduce two functional tools to help you understand and optimize your cluster resources to get the best performance in a multitenant Hadoop environment.

Tez View: Understand and Optimize Jobs in your Cluster

The Tez View gives you visibility into all the jobs on your cluster, allowing you to quickly identify which jobs consume the most resources and which are the best candidates to optimize.

With the Tez View you can quickly spot Hive or Pig jobs that are taking the longest, writing the most data or consuming the most CPU.…

Apache Ambari is the only 100% open source management and provisioning tool for Apache Hadoop. Recent innovations of Apache Ambari have focused on opening Apache Ambari into a pluggable management platform that can automate cluster provisioning, deploy 3rd party software and provide custom operational and developers’ views to the end user.

Join us Thursday March 26 at 10am PT, for an online technical workshop where we will cover 3 key integration points of Apache Ambari including Stacks, Views and Blueprints and deliver working examples of each.…

This is the third post in a series exploring recent innovations in the Hadoop ecosystem that are included in Hortonworks Data Platform (HDP) 2.2. In this post, we introduce the theme of supporting rolling upgrades and downgrades of a Hadoop YARN cluster.

HDP 2.2 offers substantial innovations in Apache™ Hadoop YARN, enabling Hadoop users to efficiently store and interact with their data in a single repository, simultaneously using a wide variety of engines.…

As a data scientist working with Hadoop, I often use Apache Hive to explore data, make ad-hoc queries or build data pipelines.

Until recently, optimizing Hive queries focused mostly on data layout techniques such as partitioning and bucketing or using custom file formats.

In the last couple of years, driven largely by the innovation of the Hive community around the Stinger initiative, Hive query time has improved dramatically, enabling Hive to support both batch and interactive workloads at speed and at scale.…

This is a unique moment in time. Fueled by open source, Apache Hadoop has become an essential part of the modern enterprise data architecture and the Hadoop market is accelerating at an amazing rate.

The impressive thing about successful open source projects is the pace of the “release early, release often” development cycle, also known as upstream innovation. The process moves through major and minor releases at a regular clip and the downstream users get to pick the releases and versions they want to consume for their specific needs.…

This is the second post in a series that explores recent innovations in the Hadoop ecosystem that are included in HDP 2.2. In this post, we introduce the theme of running service-workloads in YARN to set context for deeper discussion in subsequent blogs.

HDP 2.2 brings substantial innovations in Apache Hadoop YARN, enabling users of Apache Hadoop to efficiently store their data in a single repository and interact with it simultaneously using a wide variety of engines.…

Hortonworks Data Platform (HDP) provides Hadoop for the Enterprise, with a centralized architecture of core enterprise services, for any application and any data. HDP is uniquely built around native YARN services to enable a centralized architecture through which multiple data access applications interact with a shared data set. Apache Hive is one of the most important of those data access applications—the defacto standard for interactive SQL queries over petabytes of data in Hadoop.…

This is the first post in a series that explores recent innovations in the Hadoop ecosystem that are included in HDP 2.2. In this post, we introduce themes to set context for deeper discussion in subsequent blogs.

HDP 2.2 represents another major step forward for Enterprise Hadoop. With thousands of enhancements across all elements of the platform spanning data access to security to governance, rolling upgrades and more, HDP 2.2 makes it even easier for our customers to incorporate HDP as a core component of Modern Data Architecture (MDA).…

We take pride in producing valuable technical blogs and sharing it with a wider audience. Of all the blogs published in 2014 on our website, the following were most popular:

  • Improving Spark for Data Pipelines with Native YARN Integration.

    Gopal Vijayaraghavan and Oleg Zhurakousky demonstrate improved Apache Spark, which with the help of the pluggable execution context.

  • HDP 2.2 A Major Step Forward for Enterprise Hadoop

    Tim Hall outlines six months of innovation and new features across Apache Hadoop and its related projects.

  • It gives me great pleasure to announce that the Apache Hadoop community has released Apache Hadoop 2.6.0 !

    In particular, we are excited about three major pieces in this release: heterogeneous storage in HDFS with SSD & Memory tiers, support for long-running services in YARN and rolling upgrades—the ability to upgrade your cluster software and restart upgraded nodes without taking the cluster down or losing work in progress. With YARN as its architectural center, Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it simultaneously in different ways.…

    With YARN as its architectural center, Apache Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it simultaneously in different ways. Apache Tez supports YARN-based, high performance batch and interactive data processing applications in Hadoop that need to handle datasets scaling to terabytes or petabytes.

    The Apache community just released Apache Pig 0.14.0,and the main feature is Pig on Tez.…