Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics, offering information and knowledge of the Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
April 04, 2017
prev slideNext slide

Top 5 Performance Boosters with Apache Hive LLAP

Now Generally Available in HDP 2.6

Hive LLAP (Low Latency Analytical Processing) is Hive’s new architecture that delivers MPP performance at Hadoop scale through a combination of optimized in-memory caching and persistent query executors that scale elastically within YARN clusters.

Hive LLAP — MPP Performance at Hadoop Scale  

Since Hive LLAP was introduced as a technical preview in Hortonworks Data Platform (HDP) 2.5, many users have discovered that LLAP delivers low latency, concurrency and overall performance that were not possible with earlier versions of HDP. Now that LLAP is generally available with HDP 2.6, let’s take some time to look at the top 5 performance boosters you’re missing out on if you’re not using LLAP.

1. Dynamic Runtime Filtering For Speed and for Scale

SQL Join performance is the key to scalable and efficient query processing in a data warehouse environment. The straightforward approach of joining all records is slow, doesn’t scale and users need to properly partition data to avoid performance problems.

Hive Dynamic Runtime Filtering provides a better, fully dynamic, solution. With Dynamic Runtime Filtering, Hive automatically builds a bloom filter based on actual dimension table values and uses this filter to eliminate rows that cannot match. Records that have no chance of matching are simply skipped and are never evaluated in downstream join or shuffle operations, resulting in massive CPU and network savings. The filter is completely transparent to the user and requires no re-write of existing SQL queries.

Dynamic Runtime Filtering 

The performance benefits are dramatic, especially for highly selective queries over large datasets. Stay tuned for a full update on Hive performance with HDP 2.6 within a few weeks. In our in-progress tests, runtime filtering was able to reduce TPC-DS Query 32 runtime from 160 seconds all the way down to 7 seconds at 10 terabyte scale, more than a 20x speedup.

2. Zero-ETL Analytics on CSV and JSON data

Fast analytics on Hadoop have always come with one big catch: they require up-front conversion to a columnar format like ORCFile, which is time-consuming, complex and limits your agility.

HDP 2.6 introduces the LLAP Dynamic Text Cache, which converts CSV or JSON data into LLAP’s optimized in-memory format on-the-fly. Caching is dynamic so the queries your users run determine what data is cached, no administrator intervention is required. After text data is cached, analytics run just as fast as if you had converted it to ORCFile.

The LLAP Dynamic Text Cache brings agility back to Hadoop, with the performance that your users demand.

3. Cache 4x More Data with LLAP SSD Cache

DRAM is still the most over-subscribed resource in the data center, and as much as we’d like to just throw all data in RAM it’s too expensive. Advancements in SSD technology like the Intel Opthane™ promise to make SSD almost as fast as RAM at a fraction of the cost.

HDP 2.6 introduces the LLAP SSD Cache, allowing you to combine RAM and SSD into a giant pool of memory with all of the other benefits the LLAP cache brings. With the LLAP SSD Cache, a typical server profile can cache 4x more data, letting you process larger datasets or supporting more users.

4. Full Decimal Vectorization

In SQL, Decimals are commonly used in financial calculations where exact results are mandatory. In HDP 2.6, Hive LLAP fully supports vectorizing queries that include Decimal data, meaning analytics using Decimal data run up to 3x faster.

5. More SIMD Optimizations For a Faster Inner Loop

Intel and other chip vendors have been building SIMD (Single Instruction Multiple Data) vectorization into the chip for several years now, a trend that is continuing with the addition of specialized extensions for deep learning and other advanced analytics. Hive LLAP employs a number of optimizations specific to the AVX2 instruction set, available on certain processors made after 2013. These optimizations deliver up to 400% faster integer comparisons, 50% faster conditional expressions and much more. With LLAP and the right hardware, analytics finish faster than they ever could before.

LLAP One-Click Enablement Makes Getting Started Easy

Getting started with LLAP is simple in Ambari 2.5.  Simply enable Interactive Query in the Hive service page, select the number of nodes to run LLAP on, and you’re set.

Enable LLAP with One Click in Ambari 

Comments

  • Leave a Reply

    Your email address will not be published. Required fields are marked *