The need for a ToJson EvalFunc
When integrating Pig with different NoSQL ‘databases,’ or when publishing data from Hadoop, it can be convenient to JSONize your data. Although Pig has JsonStorage, there hasn’t been a ToJson EvalFunc. This has been inconvenient, as in our post about Pig and ElasticSearch, such that for creating JSON for ElasticSearch to index, tricks like this were necessary:…
store enron_emails into ‘/tmp/enron_emails_elastic’ using JsonStorage();
json_emails = load ‘/tmp/enron_emails_elastic’ AS (json_record:chararray);
/* Now we can store our email json data to elasticsearch for indexing with message_id.