Looking for Server Log Data sample

to create new topics or reply. | New User Registration

This topic contains 6 replies, has 4 voices, and was last updated by  Cheryle Custer 1 year, 7 months ago.

  • Creator
  • #29574

    In the July in Review Hortonworks newsletter that just showed up in my mailbox, a link to a YouTube video: http://www.youtube.com/watch?feature=player_embedded&v=BPC_mClNSXk#at=11 was included that showed a nice use case for using Hadoop in an enterprise security context. The video mentions being able to accomplish this yourself using the Sandbox. I wonder if someone at Hortonworks can provide the sample data that supports the video. The use case is a spike in failed VPN login attempts whose patterns are revealed by processing server log files and then first visualizing with Elastic Search and then with Excel 2013. The raw data is shown in the Sandbox file browser first and then that data supports the visualizations.

    I have a version of the Sandbox running to which I have added Elastic Search. It would be great to have some raw log data to experiment with. Perhaps this data can even be used for the Sandbox tutorials.

    Who at Hortonworks could I contact to see about getting some demo data to try and duplicate the YouTube scenario in my environment?


Viewing 6 replies - 1 through 6 (of 6 total)

You must be to reply to this topic. | Create Account

  • Author
  • #33677

    Cheryle Custer

    Hi Michael,

    I’m glad the tutorial showed up. If the tutorial version is 1.0.006, that is the current version. If Tutorial 12 in the tutorial pane does not show up try these steps:

    1) click in the tutorial pane and refresh the page
    2) restart the entire browser
    3) try a different browser
    4) clear the browsing history and cache

    Particularly on Chrome, we find that you have to sometimes clear the browsing history and cache in order for the new material to show up. That’s probably why moving to a new machine helped solve the problem, as you had a new browser.


    Scratch that, I can see the new tutorial now. I moved my sandbox to a different machine (just copied the VM), no idea why this would make a difference, but it seems to have.


    Cheryle, I updated the tutorials this morning and it now shows version 1.0.006, but I don’t see a Tutorial 12. Is this the current version of the tutorials, or am I still missing one?


    Cheryle Custer

    Today, we published the Server Log tutorial. In this tutorial, you’ll find a pretty neat Python script to generate your own Server Logs. Keep reading into the appendix as we have further discussion on creating events in Flume. Enjoy!


    Cheryle Custer

    Hi James,

    Can you email me directly at hwsandbox at hortonworks dot com? I may be able help you out.




    HI James,

    I’ll ask around and see if we can provide the logs in that video. In the meantime there are some web logs on the sandbox already, they are the httpd logs in /var/log/httpd. Which I’m sure could be copied to hdfs and massaged a bit.


Viewing 6 replies - 1 through 6 (of 6 total)
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.