The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

Hortonworks Sandbox Forum

Looking for Server Log Data sample

  • #29574

    In the July in Review Hortonworks newsletter that just showed up in my mailbox, a link to a YouTube video: was included that showed a nice use case for using Hadoop in an enterprise security context. The video mentions being able to accomplish this yourself using the Sandbox. I wonder if someone at Hortonworks can provide the sample data that supports the video. The use case is a spike in failed VPN login attempts whose patterns are revealed by processing server log files and then first visualizing with Elastic Search and then with Excel 2013. The raw data is shown in the Sandbox file browser first and then that data supports the visualizations.

    I have a version of the Sandbox running to which I have added Elastic Search. It would be great to have some raw log data to experiment with. Perhaps this data can even be used for the Sandbox tutorials.

    Who at Hortonworks could I contact to see about getting some demo data to try and duplicate the YouTube scenario in my environment?


  • Author
  • #29578

    HI James,

    I’ll ask around and see if we can provide the logs in that video. In the meantime there are some web logs on the sandbox already, they are the httpd logs in /var/log/httpd. Which I’m sure could be copied to hdfs and massaged a bit.


    Cheryle Custer

    Hi James,

    Can you email me directly at hwsandbox at hortonworks dot com? I may be able help you out.


    Cheryle Custer

    Today, we published the Server Log tutorial. In this tutorial, you’ll find a pretty neat Python script to generate your own Server Logs. Keep reading into the appendix as we have further discussion on creating events in Flume. Enjoy!


    Cheryle, I updated the tutorials this morning and it now shows version 1.0.006, but I don’t see a Tutorial 12. Is this the current version of the tutorials, or am I still missing one?


    Scratch that, I can see the new tutorial now. I moved my sandbox to a different machine (just copied the VM), no idea why this would make a difference, but it seems to have.

    Cheryle Custer

    Hi Michael,

    I’m glad the tutorial showed up. If the tutorial version is 1.0.006, that is the current version. If Tutorial 12 in the tutorial pane does not show up try these steps:

    1) click in the tutorial pane and refresh the page
    2) restart the entire browser
    3) try a different browser
    4) clear the browsing history and cache

    Particularly on Chrome, we find that you have to sometimes clear the browsing history and cache in order for the new material to show up. That’s probably why moving to a new machine helped solve the problem, as you had a new browser.

The forum ‘Hortonworks Sandbox’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.