The Hortonworks Blog

Posts categorized by : Server Logs

This is the second in the series of blogs exploring how to write data-driven applications in Java using the Cascading SDK. The series are:

  • WordCount
  • Log Parsing
  • Historically, programming languages and software frameworks have evolved in a singular direction, with a singular purpose: to achieve simplicity, hide complexity, improve developer productivity, and make coding easier. And in the process, foster innovation to the degree we have seen today—and benefited from.

    Anyone among you is “young” enough to admit writing code in microcode and assembly language?…

    Trifacta is a Hortonworks Technology Partner, a pioneer in data transformation, recently certified with HDP 2.1. Here, Trifacta’s CTO and Co-founder Sean Kandel, talks about their Predictive Interaction ™ solution with Hortonworks Data Platform.

    “I spend more than half my time integrating, cleansing and transforming data without doing any actual analysis. Most of the the time I’m lucky if I get to do any analysis.” – Data Scientist [1]

    The most commonly reported use of Hadoop today is data transformation. …

    Elasticsearch’s engine integrates with Hortonworks Data Platform 2.0 and YARN to provide real-time search and access to information in Hadoop.

    See it in action:  register for the Hortonworks and Elasticsearch webinar on March 5th 2014 at 10 am PST/1pm EST to see the demo and an outline for best practices when integrating Elasticsearch and HDP 2.0 to extract maximum insights from your data.  Click here to register for this exciting and informative webinar!…

    This is the fourth in our series on modern data architectures across industry verticals. Others in the series are:

    We’ve probably all heard the famous quote by John Wanamaker, the father of modern advertising: “Half the money I spend on advertising is wasted; the trouble is, I don’t know which half.”

    Wanamaker would love Apache Hadoop for retail applications, because it diminishes (or eliminates) the dilemma he described.…

    I recently sat down with Himanshu Bari to discuss how Apache Ambari will serve as the single point of management for Hadoop 2 clusters integrated with Apache Storm and its real-time, streaming event processing.

    Himanshu discusses Apache Storm’s five key benefits and how those will add to the power and stability of a Hadoop 2 stack, providing analysis of huge data flows from the second data is created and then for decades of historical analysis of that data stored in HDFS.…