Analyzing Social Media and Customer Sentiment

With Hadoop, you can mine Twitter, Facebook and other social media conversations for sentiment data about you and your competition, and use it to make targeted, real-time, decisions that increase market share.

This demo takes a Hadoop user through the steps for extracting sentiment data from Twitter and analyzing the performance of a recent movie release.

In this tutorial, you’ll learn how to:

  • Stream twitter feeds into HDFS with Flume
  • Use HCatalog to build a relational view of the data
  • Use Hive to query and refine the data
  • Import the data into Microsoft Excel with the ODBC connector
  • Visualize data with Powerview
This Hadoop tutorial is from the Hortonworks Sandbox – a single-node Hadoop cluster running in a virtual machine. Download to run this and other tutorials in the series. The tutorials presented here are for Sandbox v2.0

Sentiment Data

Sentiment data is unstructured data that represents opinions, emotions, and attitudes contained in sources such as social media posts, blogs, online product reviews, and customer support interactions.

Potential Uses of Sentiment Analysis

Organizations use sentiment analysis to understand how the public feels about something at a particular moment in time, and also to track how those opinions change over time.

An enterprise may analyze sentiment about:

    • A product – For example, does the target segment understand and appreciate messaging around a product launch? What products do visitors tend to buy together, and what are they most likely to buy in the future?
    • A service – For example, a hotel or restaurant can look into its locations with particularly strong or poor service.
    • Competitors – In what areas do people see our company as better than (or weaker than) our competition?
    • Reputation – What does the public really think about our company? Is our reputation positive or negative?

In this tutorial, we will focus on a product launch. Specifically, we will look at public sentiment during the days leading up to and immediately following the recent release of the movie Iron Man 3.

How did the public feel about the debut, and how might that sentiment data have been used to better promote the movie’s launch?


  • Hortonworks Sandbox (installed and running)
  • Hortonworks ODBC driver installed and configured

Refer to:


  • In this tutorial, the screenshots show the Hortonworks Sandbox is installed on an Oracle VirtualBox virtual machine (VM) – your screens may be different.
  • Install the ODBC driver that matches the version of Excel you are using (32-bit or 64-bit).
  • In this tutorial, we will use the Power View feature in Excel 2013 to visualize the sentiment data. Power View is currently only available in Microsoft Office Professional Plus and Microsoft Office 365 Professional Plus.
  • Note, other versions of Excel will work, but the visualizations will be limited to charts. You can connect to any other visualization tool you like


To refine and visualize website sentiment data, we will:

  • Download and extract the sentiment tutorial files.
  • Load Twitter data into the Hortonworks Sandbox.
  • Copy a Hive script to the Sandbox.
  • Run the Hive script to refine the raw data.
  • Access the refined sentiment data with Excel.
  • Visualize the sentiment data using Excel Power View.

Step 1: Download and Extract the Sentiment Tutorial Files

  • You can download a set of sample Twitter data contained in a compressed (.zip) folder here:SentimentFiles.zipThe Twitter data was obtained using Hortonworks Flume. Flume can be used as a log aggregator, collecting log data from many diverse sources and moving it to a centralized data store. In this case, Flume was used to capture the Twitter stream data, which we can now load into the Hadoop Distributed File System (HFDS).
  • Save the file to your computer, and then extract the files. You should see a SentimentFiles folder.

Step 2: Load Twitter Data into the Hortonworks Sandbox

We will now load Twitter data into the Sandbox.

To make things simpler we have prepared a zip file with all the data files we need uploaded into the sandbox. Click on the File Browser tab at the top and then select Upload -> Zip File.

You will see a file selection box and navigate into the SentimentFiles folder. You will see a file called Select that and start the upload.

The file will upload into the Sandbox VM and be automatically unpacked into the directory.

When the upload it complete you will see a new folder called upload.

Step 3: Copy a Hive Script to the Sandbox

We will now use SCP to copy the hiveddl.sql file to the Sandbox. The procedure is slightly different for Windows and Mac, so both methods are described here.

Mac OS X — Copy the hiveddl.sql File to the Sandbox

    • Open a Terminal window and navigate to “SentimentFiles/upload/hive” subfolder in the SentimentFiles folder you extracted previously. Type in the following commands, then press the Enter key:scp -P 2222 hiveddl.sql root@

scp -P 2222 json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar root@


    • You must use an uppercase “P” for the “-P” in this command.
    • You may be prompted to validate the authenticity of the host. If so, type “yes” when prompted.

Here is what our screen looked like after we completed the transfers.

    • When prompted, type in the Sandbox password (“hadoop”), then Press Enter. This command will copy the hiveddl.sql file to the root folder on the Sandbox.When the file transfer is complete, a confirmation message will appear in the terminal window:

After the files have uploaded you can connect to the sandbox with an SSH session and verify the files were uploaded.

Windows 7: Copy the files to the Sandbox

On Windows you will need to download and install the free WinSCP application.

    • Open WinSCP and type in the following settings, then click Login.
    • Host name:
    • Port: 2222
    • User name: root

  • Type the Sandbox password (“hadoop”) in the Password box, then click OK.

  • Use the WinSCP file browser to navigate to the SentimentFiles\hive folder in the left-hand pane, and to the Sandbox /root folder in the right-hand pane.Drag-and-drop the hiveddl.sql file from the SentimentFiles\hive folder to the /root folder on the Sandbox.

Click Copy on the Copy pop-up to confirm the file transfer. Do the same with the json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar file.

Step 4: Run the Hive Script to Refine the Raw Data

    • In the Hortonworks Sandbox virtual machine (VM) console window, press the Alt and F5 keys, then log in to the Sandbox using the following user name and password:Login: root\ Password: hadoopAfter you log in, the command prompt will appear with the prefix [root@sandbox ~]#:
    • At the command prompt, type in the following command, then press the Enter key:hive -f hiveddl.sql

Lines of text appear as the script runs a series of MapReduce jobs. It will take a few minutes for the script to finish running. When the script has finished running, the time taken is displayed, and the normal command prompt appears.

The hiveddl.sql script has performed the following steps to refine the data:

  • Converted the raw Twitter data into a tabular format.
  • Used the dictionary file to score the sentiment of each Tweet by the number of positive words compared to the number of negative words, and then assigned a positive, negative, or neutral sentiment value to each Tweet.
  • Created a new table that includes the sentiment value for each Tweet.

We can look at the data using the Hive command line. We can start Hive by typing hive at the prompt.

Remember to add the json serde jar file so we can look at the tables.

The command “show tables” will show you the tables. You can browse the data using the “select * from

limit 10;” command. The limit 10 gives you the first 10 records instead of the whole table.

Let’s use HCatalog to take a quick look at the data.

  • Open the Sandbox HUE user interface, then click HCatalog in the menu at the top of the page. Select the check box next to the “tweetsbi” table, then click Browse Data. The “tweetsbi” table is the table created by the Hive script that added a column with the sentiment value for each tweet. (Note, you may need to scroll right to see all of the columns.)

Now that we have refined Twitter data in a tabular format with sentiment ratings, we can access the data with Excel.

Step 5: Access the Refined Sentiment Data with Excel

In this section, we will use Excel Professional Plus 2013 to access the refined sentiment data.

    • In Windows, open a new Excel workbook, then select Data > From Other Sources > From Microsoft Query.

  • On the Choose Data Source pop-up, select the Hortonworks ODBC data source you installed previously, then click OK.The Hortonworks ODBC driver enables you to access Hortonworks data with Excel and other Business Intelligence (BI) applications that support ODBC.

  • After the connection to the Sandbox is established, the Query Wizard appears. Select the “tweetsbi” table in the Available tables and columns box, then click the right arrow button to add the entire “tweetsbi” table to the query. Click Next to continue.

  • Select the “text” column in the “Columns in your query” box, then click the left arrow button to remove the text column.

  • After the “text” column has been removed, click Next to continue.

  • On the Filter Data screen, click Next to continue without filtering the data.

  • On the Sort Order screen, click Next to continue without setting a sort order.

  • Click Finish on the Query Wizard Finish screen to retrieve the query data from the Sandbox and import it into Excel.

  • On the Import Data dialog box, click OK to accept the default settings and import the data as a table.

  • The imported query data appears in the Excel workbook.

Now that we have successfully imported the Twitter sentiment data into Microsoft Excel, we can use the Excel Power View feature to analyze and visualize the data.

Step 6: Visualize the Sentiment Data Using Excel Power View

Data visualization can help you optimize your website and convert more visits into sales and revenue. In this section we will see how sentiment varies by country, and review the sentiment data for the United States.

    • In the Excel worksheet with the imported “tweetsbi” table, select Insert > Power View to open a new Power View report.

  • The Power View Fields area appears on the right side of the window, with the data table displayed on the left. Drag the handles or click the Pop Out icon to maximize the size of the data table.

  • In the Power View Fields area, clear the checkboxes next to the id and ts fields, then click Map on the Design tab in the top menu.

  • The map view displays a global view of the data.

  • Now let’s display the sentiment data by color. In the Power View Fields area, click sentiment, then select Add as Color.

  • Under SIZE, click sentiment, then select Count (Not Blank).

  • Now the map displays the sentiment data by color:
  • Orange: positive
  • Blue: negative
  • Red: neutral

  • Use the map controls to zoom in on Ireland. About half of the tweets have a positive sentiment score, as indicated by the color orange.

  • Now use the map controls to zoom in on Mexico. In Mexico, about one-fifth of the tweets expressed negative sentiment (shown in blue), and only a small portion of the tweets were positive. Most tweets from Mexico were neutral, as shown in red.

  • Next, use the map controls to zoom in on the sentiment data in China. Marvel studios and the Chinese studio DMG co-financed Iron Man 3, and the cast included a famous Chinese actress.We can see that the majority of tweets from China are neutral, with positive sentiment slightly outweighing negative sentiment.

  • The United States is the biggest market, so let’s look at sentiment data there. The size of the United States pie chart indicates that a relatively large number of the total tweets come from the US.About half of the tweets in the US show neutral sentiment, with a relatively small amount of negative sentiment.

We’ve shown a visualization of Twitter sentiment data for the release of Iron Man 3. This information will be useful for planning marketing activities for any future Iron Man movie releases.


We are eager to hear your feedback on this tutorial. Please let us know what you think. Click here to take survey


February 2, 2014 at 9:08 am

It’s actually a cool and helpful piece of info.
I’m satisfied that you just shared this helpful info with us.

Please stay us up to date like this. Thanks for sharing.

February 10, 2014 at 9:16 pm

The bottom line here is simple: use supplements only after doing
research and exercising caution. So, what happens when we incorporate a probiotic fermentation
process to whole food nutritional ingredients. Some brands contain extra substances that you really don’t need any may
actually cause you more harm than good.

February 11, 2014 at 9:13 am

A person essentially lend a hand to make severely articles I would state.
This is the first time I frequented your web page and to this point?

I amazed with the research you made to make this particular submit incredible.
Excellent activity!

February 18, 2014 at 8:05 pm

Thanks for some other fantastic article. The place else
may just anybody get that kind of information in such an ideal method of
writing? I have a presentation subsequent week, and I am at the look for such information.

April 18, 2014 at 8:25 am

how do you it for some other movie or key word . What files are there to be modified.

Will H
May 14, 2014 at 11:00 am

I am having an issue running the initial hive script. Not sure if I’m doing something wrong but am seeing the following error:

[root@sandbox ~]# hive -f hiveddl.sql

Logging initialized using configuration in file:/etc/hive/conf.dist/
Added json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar to class path
Added resource: json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.serde2.objectinspector.primitive.AbstractPrimitiveJavaObjectInspector.(Lorg/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils$PrimitiveTypeEntry;)V
[root@sandbox ~]#

    Rohit Gore
    January 11, 2015 at 10:12 pm

    i am facing the same problem
    [root@sandbox ~]# hive -f hiveddl.sql

    Logging initialized using configuration in file:/etc/hive/conf.dist/
    Added json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar to class path
    Added resource: json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar
    FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.serde2.objectinspector.primitive.AbstractPrimitiveJavaObjectInspector.(Lorg/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils$PrimitiveTypeEntry;)V
    [root@sandbox ~]#

May 16, 2014 at 5:49 am

Hi, when i run the hiveddl.sql script all it does it just creates bunch of tables and views and no data.. when I opened the script I found no LOAD statement in there.Is it possible the script is not complete or am i missing something here?

May 19, 2014 at 3:16 pm

Does not work on HDP2.1 with Hive .13; Created the json-serde again, but still fails.

Driver returned: 1. Errors: OK
converting to local hdfs://
Added /tmp/08cf0f24-0df6-4b44-8890-6150f2873398_resources/json-serde- to class path
Added resource: /tmp/08cf0f24-0df6-4b44-8890-6150f2873398_resources/json-serde-
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Could not initialize class

Yogesh Sobale
May 20, 2014 at 3:19 am

In given example, the version for json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar is not compatible with hive installed in HDP 2.1.
After debugging I found that hive installed in sandbox has jar hive-serde- which is not compatible with json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar. It is having issue with the constructor AbstractPrimitiveJavaObjectInspector(). Can you please provide the correct jar ?

June 10, 2014 at 1:13 am

I followed the same steps above, but while executing the hiveddl.sql, i get an error:

Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTa sk. org.apache.hadoop.hive.serde2.objectinspector.primitive.AbstractPrimitiveJav aObjectInspector.(Lorg/apache/hadoop/hive/serde2/objectinspector/primitive /PrimitiveObjectInspectorUtils$PrimitiveTypeEntry;)V

Please help.

June 10, 2014 at 1:32 am

I have followed the same steps, but while executing the hiveddl.sql, i get the following error.

Executed in putty :

Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTa sk. org.apache.hadoop.hive.serde2.objectinspector.primitive.AbstractPrimitiveJav aObjectInspector.(Lorg/apache/hadoop/hive/serde2/objectinspector/primitive /PrimitiveObjectInspectorUtils$PrimitiveTypeEntry;)V

Execeuted from HUE shell

Driver returned: 1. Errors: OK
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Could not initialize class

Please help.

Tri Nguyen
June 19, 2014 at 2:23 pm

Came here from the flume page
The description about the data collection using Flume is almost non-existent. The article should at least add a description about the flume configuration file.

    Jules S. Damji
    July 16, 2014 at 12:59 pm

    Thanks. Will take note.

July 9, 2014 at 5:53 am

Hi I have tried to execute this from sandbox , logged in with root…but during the execution of hiveddl.sql, i faced this error:

Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.serde2.objectinspector.primitive.AbstractPrimitiveJav aObjectInspector.(Lorg/apache/hadoop/hive/serde2/objectinspector/primitive /PrimitiveObjectInspectorUtils$PrimitiveTypeEntry;)V

Any idea what it might be due to?

    Rohit Gore
    January 12, 2015 at 9:58 pm

    Hiii Gaurav i am also facing same issue. Please help me to resolve this if you have a solution

July 11, 2014 at 2:00 pm

I’ve been surfing online more than 3 hours today, yet
I never found any interesting article like yours. It
is pretty worth enough for me. Personally, if all site owners and bloggers made good content as
you did, the net will be a lot more useful than ever before.

July 12, 2014 at 12:45 pm

Somebody essentially help to make critically articles I
would state. This is the first time I frequented your website page and so
far? I surprised with the research you made to create this actual
submit extraordinary. Magnificent job!

July 15, 2014 at 5:56 am

Thank you for this very interesting tutorial. But I have a problem:
I try to connect to WinSCP with the arguments you offer but the connection is refused.
Do you know this problem?
Thank you in advance

    August 5, 2014 at 10:29 am

    I have the same problem.

    August 5, 2014 at 10:39 am

    Port should be 22 not 2222.

Sagar Prasad
August 6, 2014 at 12:52 am

For issue : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.serde2.objectinspector.primitive.AbstractPrimitiveJavaObjectInspector.(Lorg/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils$PrimitiveTypeEntry;)V

I tried suggestion from Brandon ( and it worked for me.
This issue is caused by an incompatibility with the SerDe jar that is packed with the demo. To fix this, use the version of SerDe that is now packaged with HCatalog. Make the following changes to hiveddl.sql

1) comment out or remove the first line that adds the SerDe jar
–ADD JAR json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar;
2) Change line 34 to refer to the HCatalog version of SerDe

    September 6, 2014 at 10:14 pm

    Hi prasad .. Thanks … I followed same steps. I see Map 100% Reduce 100% but still got exception. And now trying to rerun gettong AlreadyExistsException Table tweets_raw already exists

      Mungeol Heo
      November 6, 2014 at 10:21 pm

      Try to delete hive tables that the script created, then rerun it.

December 18, 2014 at 6:25 pm

can you put more information on how you used FLUME to get twitter data

    July 28, 2015 at 11:17 pm

    I found your remark. I would be interested in information getting twitter data with flume into Haddoop?

    I would appreciate your help.

    Kind regards,

December 29, 2014 at 3:37 am

I followed the exact step mention above. But no data display in excel for tweetsbi table only column header is there. Please help what went wrong.


August 18, 2015 at 6:08 am

Someone could help me? i have this error.
Added [json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar] to class path
Added resources: [json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar]
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(
at org.apache.hadoop.hive.ql.parse.HiveParser.identifier(
at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameColonType(
at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameColonTypeList(
at org.apache.hadoop.hive.ql.parse.HiveParser.structType(
at org.apache.hadoop.hive.ql.parse.HiveParser.type(
at org.apache.hadoop.hive.ql.parse.HiveParser.colType(
at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameType(
at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameTypeList(
at org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(
at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(
at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(
at org.apache.hadoop.hive.ql.Driver.compile(
at org.apache.hadoop.hive.ql.Driver.compile(
at org.apache.hadoop.hive.ql.Driver.compileInternal(
at org.apache.hadoop.hive.ql.Driver.runInternal(
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(
at org.apache.hadoop.hive.cli.CliDriver.processCmd(
at org.apache.hadoop.hive.cli.CliDriver.processLine(
at org.apache.hadoop.hive.cli.CliDriver.processLine(
at org.apache.hadoop.hive.cli.CliDriver.processReader(
at org.apache.hadoop.hive.cli.CliDriver.processFile(
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(
at org.apache.hadoop.hive.cli.CliDriver.main(
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
at java.lang.reflect.Method.invoke(
at org.apache.hadoop.util.RunJar.main(
FAILED: ParseException line 12:6 Failed to recognize predicate ‘user’. Failed rule: ‘identifier’ in column specification
WARN: The method class org.apache.commons.logging.impl.SLF4JLogFactory#release() was invoked.

August 20, 2015 at 8:03 am

It looks like some changes have been made in recent HDP releases that prevent this tutorial from working as it is written. I’m working with the tutorial team at Hortonworks to try to get this updated, but in the meantime I’ve put together a script that fixes the issues and installs the tutorial data on the latest HDP sandbox. It’s available here:

Until we get the tutorial updated, please try the above to run this.

(Disclaimer: while I am an employee of Hortonworks on the NE US Solutions Engineering team, this is not an official update to the tutorial. It’s just something I put together on my own to get things working and wanted to share. However, if you have any problems I’m happy to try to help. Just reach out to me at rmccollam [at] hortonworks [dot] com.)

September 5, 2015 at 8:07 pm

There is one more way to avoid the reserved keyword like ‘user’ is to set following property before creating tweets_raw table.


Also its good to download the latest hive json serde if you are using current version of hive – 1.2

ADD JAR json-serde-;

Here is the place where you can download it

October 18, 2015 at 12:17 am

what are the automated tools required for analyzing data.

October 18, 2015 at 1:01 pm

Perhaps this would be helpful:

By the way, I have also edited the paths. As I understand, you upload the whole folder “SentimentFiles”, not only “upload”: e.g.
LOCATION ‘/user/hue/SentimentFiles/SentimentFiles/upload/data/time_zone_map’

November 24, 2015 at 3:40 pm

Hello all,

well, I tried everything. I applied the fix, changed values but still nothing is working. Is there a “full” solution incoming.. ?

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre lang="" line="" escaped="" cssfile="">