Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
February 15, 2013
prev slideNext slide

Pig, ToJson, and Redis to publish data with Flask

Pig can easily stuff Redis full of data. To do so, we’ll need to convert our data to JSON. We’ve previously talked about pig-to-json in JSONize anything in Pig with ToJson. Once we convert our data to json, we can use the pig-redis project to load redis.

Build the pig to json project:

git clone git@github.com:rjurney/pig-to-json.git
ant

Then run our Pig code:

/* Load Avro jars and define shortcut */
register /me/Software/pig/build/ivy/lib/Pig/avro-1.5.3.jar
register /me/Software/pig/build/ivy/lib/Pig/json-simple-1.1.jar
register /me/Software/pig/contrib/piggybank/java/piggybank.jar
define AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();

register /me/Software/pig-to-json/dist/lib/pig-to-json.jar
register /me/Software/pig-redis/dist/pig-redis.jar

-- Enron emails are available at https://s3.amazonaws.com/rjurney_public_web/hadoop/enron.avro
emails = load '/me/Data/enron.avro' using AvroStorage();

json_test = foreach emails generate message_id, com.hortonworks.pig.udf.ToJson(tos) as bag_json;

store json_test into 'dummy-name' using com.hackdiary.pig.RedisStorer('kv', 'localhost');

Now run our Flask web server:

python server.py

redis-pig

Code for this post is available here: https://github.com/rjurney/enron-pig-tojson-redis-node.

Tags:

Comments

MattK says:

> need to convert our data to JSON

Why? JSON is not an optimal Redis format, as it is not one of the Redis types. Redis seems more suited for key->value structures. Won’t Redis be stuffed with a bunch of documents that it has no native functions for?

JSON would be optimal for a MongoDB or similar data store.

Russell Jurney says:

The redis driver doesn’t accept complex records from Pig, so ToJson lets you store complex records anyway.

www.playupweb.eu says:

Pig, ToJson, and Redis to publish data with Flask – Hortonworks. Your article is here, the feeling of a mere individual can harvest more.Let these people from every corner of the world, even in the heart with eudaemonia. We are not solitude.

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums