Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
August 20, 2013
prev slideNext slide

Simple Hive ‘Cheat Sheet’ for SQL Users

UPDATE: This cheat sheet was so popular, we’ve created a PDF of the content below so you can print it and use it more easily. Download here.

If you’re already familiar with SQL then you may well be thinking about how to add Hadoop skills to your toolbelt as an option for data processing.

From a querying perspective, using Apache Hive provides a familiar interface to data held in a Hadoop cluster and is a great way to get started. Apache Hive is data warehouse infrastructure built on top of Apache Hadoop for providing data summarization, ad-hoc query, and analysis of large datasets. It provides a mechanism to project structure onto the data in Hadoop and to query that data using a SQL-like language called HiveQL (HQL).

Naturally, there are a bunch of differences between SQL and HiveQL, but on the other hand there are a lot of similarities too, and recent releases of Hive bring that SQL-92 compatibility closer still.

To highlight that – and as a bit of fun to get started – below is a simple ‘cheat sheet’ (based on a simple MySQL reference such as this one) for getting started with basic querying for Hive. Here, we’ve done a direct comparison to MySQL, but given the simplicity of these particular functions, then it should be the same in essentially any SQL dialect.

Of course, if you really want to get to grips with Hive, then take a look at the full language manual.

Retrieving Information

Function MySQL Hive
Retrieving Information (General) SELECT from_columns FROM table WHERE conditions; SELECT from_columns FROM table WHERE conditions;
Retrieving All Values SELECT * FROM table; SELECT * FROM table;
Retrieving Some Values SELECT * FROM table WHERE rec_name = "value"; SELECT * FROM table WHERE rec_name = "value";
Retrieving With Multiple Criteria SELECT * FROM TABLE WHERE rec1 = "value1" AND rec2 = "value2"; SELECT * FROM TABLE WHERE rec1 = "value1" AND rec2 = "value2";
Retrieving Specific Columns SELECT column_name FROM table; SELECT column_name FROM table;
Retrieving Unique Output SELECT DISTINCT column_name FROM table; SELECT DISTINCT column_name FROM table;
Sorting SELECT col1, col2 FROM table ORDER BY col2; SELECT col1, col2 FROM table ORDER BY col2;
Sorting Reverse SELECT col1, col2 FROM table ORDER BY col2 DESC; SELECT col1, col2 FROM table ORDER BY col2 DESC;
Counting Rows SELECT COUNT(*) FROM table; SELECT COUNT(*) FROM table;
Grouping With Counting SELECT owner, COUNT(*) FROM table GROUP BY owner; SELECT owner, COUNT(*) FROM table GROUP BY owner;
Maximum Value SELECT MAX(col_name) AS label FROM table; SELECT MAX(col_name) AS label FROM table;
Selecting from multiple tables (Join same table using alias w/”AS”) SELECT, comment FROM pet, event WHERE =; SELECT, comment FROM pet JOIN event ON ( =


Function MySQL Hive
Selecting a database USE database; USE database;
Listing tables in a database SHOW TABLES; SHOW TABLES;
Describing the format of a table DESCRIBE table; DESCRIBE (FORMATTED|EXTENDED) table;
Creating a database CREATE DATABASE db_name; CREATE DATABASE db_name;
Dropping a database DROP DATABASE db_name; DROP DATABASE db_name (CASCADE);

Current SQL Compatibility

Command Line

Function Hive
Run Query hive -e 'select a.col from tab1 a'
Run Query Silent Mode hive -S -e 'select a.col from tab1 a'
Set Hive Config Variables hive -e 'select a.col from tab1 a' -hiveconf hive.root.logger=DEBUG,console
Use Initialization Script hive -i initialize.sql
Run Non-Interactive Script hive -f script.sql




Kevin says:

I see that the Hive cheat sheet is from august 2013, using Hive 0.10 + 0.11
Are there any new functions available? maybe on a newer cheat sheet?


Jules S. Damji says:

Yes, we’ve plans to update it to include an update version with latest Hive 1.0 features.

Singh says:
Your comment is awaiting moderation.

do we have updates for newer cheatsheet for hive ?

Stephanie says:

Is there an update yet?

Awadhesh says:

I have a database Test and table is employee. While running “hive -e ‘select a.col from employee\'” it throws error that table not exist. because I am not setting database name. Please help me.
How to pass database name in below query
hive -e ‘select a.col from tab1 a’

ayesamson says:

you may have to prefix your statement with hive -e ‘use database; select a.col from tab1 a’ in order to execute the statement against the specific database.

Anna Ch says:

hi! The pdf is gone 🙁

RAJA says:

How to get the MINIMUM value from two columns of a single row in HIVE

Austin says:

Hive supports the LEAST() function. So your query would look like “select LEAST(col1, col2) from db.tbl;”

Alvaro says:

**show partitions table;** to show the partitions!!

usually needed when you get this error:

` [Error 10041]: No partition predicate found for Alias `

Krigu says:


Why do we use )ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘01’stored as textfile What is the use of ‘01’ as delimiter value is there any specific for this Architectures while doing the migration from Green Plum Dbase to Hive Using SQOOP


Javi says:

octal 001 (^A) is the default field delimiter in Hive

kartik says:

what is the query for extracting a alternative column in hive??

manjul says:

Is it the update one

rb says:

how do i select based on column datatype
eg. select tablename, column from where datatype=’STRING’
does hive have something like the data dictionary table where you can select table, database , column info ?

Ameya says:

how do i sum up the values in two columns in a row and save it in a third column for the same row.
table has col x, y & z. for each row of the column, add x value and y value and store it in column z

vinay says:

Hive query to check the number of tables under a hive DB ? I’m looking for a query to show the count of tables in a particular DB.

radhik says:

is there md5 sql example for hive tables

MM says:

How can I get last 3 months data based on today’s date
Example: if today is Dec 24 2018, I need data from SEP 1 2018 to Nov 30 2018

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums