newsletter

Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

AVAILABLE NEWSLETTERS:

Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
CDA > Data Engineers & Scientists > Data Science Applications

Building a Sentiment Analysis Application

cloud Ready to Get Started?

DOWNLOAD SANDBOX

Introduction

For this project, you will play the part of a Big Data Application Developer who leverages their skills as a Data Engineer and Data Scientist by using multiple Big Data Technologies provided by Hortonworks Data Flow (HDF) and Hortonworks Data Platform (HDP) to build a Real-Time Sentiment Analysis Application. For the application, you will learn to acquire tweet data from Twitter’s Decahose API and send the tweets to the Kafka Topic “tweets” using NiFi. Next you will learn to build Spark Machine Learning Model that classifies the data as happy or sad and export the model to HDFS. However, before building the model, Spark requires the data that builds and trains the model to be in feature array, so you will have to do some data cleansing with SparkSQL. Once the model is built, you will use Spark Structured Streaming to load the model from HDFS, pull in tweets from Kafka topic “tweets”, add a sentiment score to the tweet, then stream the data to Kafka topic “tweetsSentiment”. Earlier after finishing the NiFi flow, you will build another NiFi flow that ingests data from Kafka topic “tweetsSentiment” and stores the data into HBase. With Hive and HBase integration, you will perform queries to visualize that the data was stored successfully and also show the sentiment score for tweets.

Big Data Technologies used to develop the Application:

Goals and Objectives

  • Learn to create a Twitter Application using Twitter’s Developer Portal to get KEYS and TOKENS for connecting to Twitter’s APIs
  • Learn to create a NiFi Dataflow Application that integrates Twitter’s Decahose API to ingest tweets, perform some preprocessing, store the data into the Kafka Topic “tweets”.
  • Learn to create a NiFi Dataflow Application that ingests the Kafka Topic “tweetsSentiment” to stream sentiment tweet data to HBase
  • Learn to build a SparkSQL Application to clean the data and get it into a suitable format for building the sentiment classification model
  • Learn to build a SparkML Application to train and validate a sentiment classification model using Gradient Boosting
  • Learn to build a Spark Structured Streaming Application to stream the sentiment tweet data from Kafka topic “tweets” on HDP to Kafka topic “tweetsSentiment” on HDF while attaching a sentiment score per tweet based on output of the classification model
  • Learn to visualize the tweet sentiment score by using Zeppelin’s Hive interpreter mapping to the HBase table

Prerequisites

Outline

The tutorial series consists of the following tutorial modules:

1. Application Development Concepts You will be introduced to sentiment fundamentals: sentiment analysis, ways to perform the data analysis and the various use cases.

2. Setting up the Development Environment You will create a Twitter Application in Twitter’s Developer Portal for access to KEYS and TOKENS. You will then write a shell code and perform Ambari REST API Calls to setup a development environment.

3. Acquiring Twitter Data You will build a NiFi Dataflow to ingest Twitter data, preprocess it and store it into the Kafka Topic “tweets”. The second NiFi Dataflow you will build, ingests the enriched sentiment tweet data from Kafka topic “tweetsSentiment” and streams the content of the flowfile to HBase.

4. Cleaning the Raw Twitter Data You will create a Zeppelin notebook and use Zeppelin’s Spark Interpreter to clean the raw twitter data in preparation to create the sentiment classification model.

5. Building a Sentiment Classification Model You will create a Zeppelin notebook and use Zeppelin’s Spark Interpreter to build a sentiment classification model that classifies tweets as Happy or Sad and exports the model to HDFS.

6. Deploying a Sentiment Classification Model You will create a Scala IntelliJ project in which you develop a Spark Structured Streaming application that streams the data from Kafka topic “tweets” on HDP, processes the tweet JSON data by adding sentiment and streaming the data into Kafka topic “tweetsSentiment” on HDF.

7. Visualizing Sentiment Scores You will use Zeppelin’s JDBC Hive Interpreter to perform SQL queries against the noSQL HBase table “tweets_sentiment” for visual insight into tweet sentiment score.

User Reviews

User Rating
0 No Reviews
5 Star 0%
4 Star 0%
3 Star 0%
2 Star 0%
1 Star 0%
Tutorial Name
Building a Sentiment Analysis Application

To ask a question, or find an answer, please visit the Hortonworks Community Connection.

No Reviews
Write Review

Register

Please register to write a review

Share Your Experience

Example: Best Tutorial Ever

You must write at least 50 characters for this field.

Success

Thank you for sharing your review!