Home Forums Hortonworks Sandbox PacketPig and the Sandbox

This topic contains 17 replies, has 6 voices, and was last updated by  James Solderitsch 1 year, 5 months ago.

Viewing 17 replies - 1 through 17 (of 17 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #20459

    I created an issue and someone corrected the code already in the main branch!

    Good responsiveness.

    Jim

    Collapse
    #20457

    Robert
    Participant

    Hi James,
    If that is the case, I would suggest making a comment in Github of the information you found regarding the commenting of the STORE command.

    Regards,
    Robert

    Collapse
    #19349

    OK — this is weird. In the code I obtained from the git clone command for packetpig, the binning.pig script ended like this:

    ILLUSTRATE joined;
    –STORE summary INTO ‘$output/binning’ USING PigStorage(‘,’);

    The STORE command was commented out!. When I removed the comment prefix –, the script ran and produced the expected output.

    Would that be some sort of regression?

    Jim

    Collapse
    #19348

    My problem is that the job did NOT produce that output. I copied what the install file said I should see.

    I expected an output directory should have been created where I ran the pig script and there should have been a simple csv file there. That was not the case. Some garbled output was written to my shell window at the end of the pig job but no result in this format.

    I am confused. Where should the output directory be created.

    Did I need to setup HDFS first?

    Jim

    Collapse
    #19344

    Hi Abdelrahman,

    We had a ticket raised yesterday specific to Centos https://github.com/packetloop/packetpig/issues/7

    I will work a bit more on Puppet so that we can add a puppet file to the project source and then a single command prepares the system for Packetpig. I will also have a manual install for Centos as well.

    If you have any other issues raise them as a Github issue if you can, that would be great!

    - Michael

    Collapse
    #19343

    Hi James,

    The binning.pig job is really simple and outputs exactly what you saw;

    more output/binning/part-r-00000
    1322643600,171738,142808,338610

    The data/web.pcap is a small pcap so when you bin it based on time you get a single bin. But if you give it a larger pcap it will output more lines of output.

    So if binning works you are on your way. For a single node (e.g. Mac OSX) you don’t need to worry about HDFS etc. You only need this when you start building a cluster. When you start using it on a cluster you will need to copy the pcaps into HDFS so all nodes can access it.

    Collapse
    #19310

    Sorry for the confusion on the last post. Was not running Pig on Hortonworks but inside Mac OS X with hadoop and pig installed there as directed by the PacketPig install guide.

    I suspect I need to do some more Hadoop configuration after installing that is assumed, but not presented, in the PacketPig install guide. For example, I did not configure HDFS or pay attention to the hadoop configuration file entries but perhaps I need to.

    I did say the content of the reply was off-topic to the Sandbox.

    Collapse
    #19296

    Larry Liu
    Moderator

    Hi, James

    The jobtracker log is a good starting point to check the cause why the pig job failed. You can also upload to our ftp site:

    http://hortonworks.com/community/forums/topic/hmc-installation-support-help-us-help-you/

    Larry

    Collapse
    #19252

    I did follow the Mac OS X instructions just to see if I could get packetpig to work.

    I got all the way to the end, but fail here:

    more output/binning/part-r-00000
    1322643600,171738,142808,338610

    I don’t see any output directory. I see lots of log info when the pig job runs, do not see any obvious errors, but I don’t see the mapreduce success message either.

    Before I get back to the command prompt, I do see a bunch of output to the screen and this looks like it might be the actual output of the pig script but I am not certain.

    This is off-topic to the Sandbox, so maybe I should post the question somewhere else — where might that be?

    Running 10.8.3 — I did use brew as suggested in the install instructions to set everything up. Some warnings along the way but nothing too noteworthy.

    Jim

    Collapse
    #19220

    Michael,

    Thanks for the detailed comments. I will see if I have the chops to make a stab at a CentOS install guide and then create the pull request to document what I have done. Maybe someone will beat me to it — some Hortonworks people maybe?

    First I want to try out the OS X guide and maybe that will provide some inspiration.

    Jim

    Collapse
    #19219

    abdelrahman
    Moderator

    Hi Michael,

    Thanks a lot for the information that you have provided. The packetpig can be installed on Ubuntu and Mac OSX, Have you tried to install it on other OS such as Centos6.x?

    Thanks
    -Abdelrahman

    Collapse
    #19218

    Hi James,

    I have written walkthroughs for installing Packetpig under Ubuntu and Mac OSX to try and get people off and running. I was also investigating a Vagrant/Puppet combination to get people up and racing fast as well.

    At the moment Packetpig is not linked to HDN. The articles on Hortonworks were part of Russell Jurney and I collaborating on some cool things you can do with Pig.

    Packetpig is maintained here https://github.com/packetloop/packetpig and this is where people raise most of their issues etc. or you can use @packetpig on Twitter.

    Packetpig is developed mainly by Packetloop (where I work) and we blog about it on http://blog.packetloop.com

    As Packetpig is open source if anyone would like to submit a pull request for updating install guides on other platforms that would be *really* helpful ;)

    Hope this helps.

    Michael

    Collapse
    #19190

    Thanks to respondents who are checking into this. If I find out anything I will report here as well.

    Surprising that since Hortonworks gives good visibility to PacketPig in blogs hosted at Hortonworks that there are no recipes for installing PacketPig inside a Hortonworks container.

    Not even full HDP, eh?

    Collapse
    #19170

    tedr
    Member

    Hi James,

    Also of note here is that The Sandbox is on CentOs 6.3, which is currently not supported by PacketPig, even if you manage to get it installed.

    Thanks,
    Ted.

    Collapse
    #19169

    tedr
    Member

    Hi James,

    Thanks for your continued use of the Sandbox. We currently do not have any instructions for installing PacketPig on CentOS and I was unable to find any in a quick search of the web. I’ll keep digging as time allows, though.

    Thanks,
    Ted.

    Collapse
    #19160

    I found install instructions for PacketPig here: https://github.com/packetloop/packetpig/blob/master/INSTALL.md but they are for Ubuntu which uses debian-style install procedures. Anyone have a CentOS (relying on yum) set of modified instructions? The github install instructions use Cloudera resources as well, which we of course don’t want to rely on!

    Collapse
    #19158

    Larry Liu
    Moderator

    Hi, James

    Here are other blogs which are great examples:

    http://hortonworks.com/blog/big-data-security-part-two-introduction-to-packetpig/

    http://hortonworks.com/blog/packetpig-finding-zero-day-attacks/

    I think sandbox should be able to host a PacketPig demo. If you have concerns about performance, you can turn off other services such as hbase, hive.

    Do let me know if you have any issues you run into at demo.

    Thanks
    Larry

    Collapse
Viewing 17 replies - 1 through 17 (of 17 total)