In a recent post we used Pig to summarize documents via the Term-Frequency, Inverse Document Frequency (TF-IDF) algorithm.
In this post, we’re going to turn that code into a Pig macro that can be called in one line of code:
my_tf_idf_scores = tf_idf(id_body, ‘message_id’, ‘body’);
Our macro, in filename tfidf.macro looks just like our pig script, with a couple of new lines. Note the use of macro variables for input and output preceded with the ‘$’ character: $in_relation, $out_relation, $id_field and $text_field.…