Hortonworks Sandbox Forum

PacketPig and the Sandbox

to create new topics or reply. | New User Registration

  • Author
  • #19158
    Larry Liu

    Hi, James

    Here are other blogs which are great examples:

    I think sandbox should be able to host a PacketPig demo. If you have concerns about performance, you can turn off other services such as hbase, hive.

    Do let me know if you have any issues you run into at demo.



    I found install instructions for PacketPig here: https://github.com/packetloop/packetpig/blob/master/INSTALL.md but they are for Ubuntu which uses debian-style install procedures. Anyone have a CentOS (relying on yum) set of modified instructions? The github install instructions use Cloudera resources as well, which we of course don’t want to rely on!


    Hi James,

    Thanks for your continued use of the Sandbox. We currently do not have any instructions for installing PacketPig on CentOS and I was unable to find any in a quick search of the web. I’ll keep digging as time allows, though.



    Hi James,

    Also of note here is that The Sandbox is on CentOs 6.3, which is currently not supported by PacketPig, even if you manage to get it installed.



    Thanks to respondents who are checking into this. If I find out anything I will report here as well.

    Surprising that since Hortonworks gives good visibility to PacketPig in blogs hosted at Hortonworks that there are no recipes for installing PacketPig inside a Hortonworks container.

    Not even full HDP, eh?


    Hi James,

    I have written walkthroughs for installing Packetpig under Ubuntu and Mac OSX to try and get people off and running. I was also investigating a Vagrant/Puppet combination to get people up and racing fast as well.

    At the moment Packetpig is not linked to HDN. The articles on Hortonworks were part of Russell Jurney and I collaborating on some cool things you can do with Pig.

    Packetpig is maintained here https://github.com/packetloop/packetpig and this is where people raise most of their issues etc. or you can use @packetpig on Twitter.

    Packetpig is developed mainly by Packetloop (where I work) and we blog about it on http://blog.packetloop.com

    As Packetpig is open source if anyone would like to submit a pull request for updating install guides on other platforms that would be *really* helpful 😉

    Hope this helps.



    Hi Michael,

    Thanks a lot for the information that you have provided. The packetpig can be installed on Ubuntu and Mac OSX, Have you tried to install it on other OS such as Centos6.x?




    Thanks for the detailed comments. I will see if I have the chops to make a stab at a CentOS install guide and then create the pull request to document what I have done. Maybe someone will beat me to it — some Hortonworks people maybe?

    First I want to try out the OS X guide and maybe that will provide some inspiration.



    I did follow the Mac OS X instructions just to see if I could get packetpig to work.

    I got all the way to the end, but fail here:

    more output/binning/part-r-00000

    I don’t see any output directory. I see lots of log info when the pig job runs, do not see any obvious errors, but I don’t see the mapreduce success message either.

    Before I get back to the command prompt, I do see a bunch of output to the screen and this looks like it might be the actual output of the pig script but I am not certain.

    This is off-topic to the Sandbox, so maybe I should post the question somewhere else — where might that be?

    Running 10.8.3 — I did use brew as suggested in the install instructions to set everything up. Some warnings along the way but nothing too noteworthy.


    Larry Liu

    Hi, James

    The jobtracker log is a good starting point to check the cause why the pig job failed. You can also upload to our ftp site:



    Sorry for the confusion on the last post. Was not running Pig on Hortonworks but inside Mac OS X with hadoop and pig installed there as directed by the PacketPig install guide.

    I suspect I need to do some more Hadoop configuration after installing that is assumed, but not presented, in the PacketPig install guide. For example, I did not configure HDFS or pay attention to the hadoop configuration file entries but perhaps I need to.

    I did say the content of the reply was off-topic to the Sandbox.


    Hi James,

    The binning.pig job is really simple and outputs exactly what you saw;

    more output/binning/part-r-00000

    The data/web.pcap is a small pcap so when you bin it based on time you get a single bin. But if you give it a larger pcap it will output more lines of output.

    So if binning works you are on your way. For a single node (e.g. Mac OSX) you don’t need to worry about HDFS etc. You only need this when you start building a cluster. When you start using it on a cluster you will need to copy the pcaps into HDFS so all nodes can access it.


    Hi Abdelrahman,

    We had a ticket raised yesterday specific to Centos https://github.com/packetloop/packetpig/issues/7

    I will work a bit more on Puppet so that we can add a puppet file to the project source and then a single command prepares the system for Packetpig. I will also have a manual install for Centos as well.

    If you have any other issues raise them as a Github issue if you can, that would be great!

    – Michael


    My problem is that the job did NOT produce that output. I copied what the install file said I should see.

    I expected an output directory should have been created where I ran the pig script and there should have been a simple csv file there. That was not the case. Some garbled output was written to my shell window at the end of the pig job but no result in this format.

    I am confused. Where should the output directory be created.

    Did I need to setup HDFS first?



    OK — this is weird. In the code I obtained from the git clone command for packetpig, the binning.pig script ended like this:

    ILLUSTRATE joined;
    –STORE summary INTO ‘$output/binning’ USING PigStorage(‘,’);

    The STORE command was commented out!. When I removed the comment prefix –, the script ran and produced the expected output.

    Would that be some sort of regression?



    Hi James,
    If that is the case, I would suggest making a comment in Github of the information you found regarding the commenting of the STORE command.



    I created an issue and someone corrected the code already in the main branch!

    Good responsiveness.


You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.