Pig, ToJson, and Redis to publish data with Flask
February 15th, 2013
Pig can easily stuff Redis full of data. To do so, we’ll need to convert our data to JSON. We’ve previously talked about pig-to-json in JSONize anything in Pig with ToJson. Once we convert our data to json, we can use the pig-redis project to load redis.
Build the pig to json project:
git clone git@github.com:rjurney/pig-to-json.git
ant
Then run our Pig code:
/* Load Avro jars and define shortcut */
register /me/Software/pig/build/ivy/lib/Pig/avro-1.5.3.jar
register /me/Software/pig/build/ivy/lib/Pig/json-simple-1.1.jar
register /me/Software/pig/contrib/piggybank/java/piggybank.jar
define AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
register /me/Software/pig-to-json/dist/lib/pig-to-json.jar
register /me/Software/pig-redis/dist/pig-redis.jar
-- Enron emails are available at https://s3.amazonaws.com/rjurney_public_web/hadoop/enron.avro
emails = load '/me/Data/enron.avro' using AvroStorage();
json_test = foreach emails generate message_id, com.hortonworks.pig.udf.ToJson(tos) as bag_json;
store json_test into 'dummy-name' using com.hackdiary.pig.RedisStorer('kv', 'localhost');
Now run our Flask web server:
python server.py
Code for this post is available here: https://github.com/rjurney/enron-pig-tojson-redis-node.

> need to convert our data to JSON
Why? JSON is not an optimal Redis format, as it is not one of the Redis types. Redis seems more suited for key->value structures. Won’t Redis be stuffed with a bunch of documents that it has no native functions for?
JSON would be optimal for a MongoDB or similar data store.
The redis driver doesn’t accept complex records from Pig, so ToJson lets you store complex records anyway.