The Hortonworks Blog

Posts categorized by : Geolocation

Trifacta is a Hortonworks Technology Partner, a pioneer in data transformation, recently certified with HDP 2.1. Here, Trifacta’s CTO and Co-founder Sean Kandel, talks about their Predictive Interaction ™ solution with Hortonworks Data Platform.

“I spend more than half my time integrating, cleansing and transforming data without doing any actual analysis. Most of the the time I’m lucky if I get to do any analysis.” – Data Scientist [1]

The most commonly reported use of Hadoop today is data transformation. …

This is the sixth in our series on modern data architectures across industry verticals. Others in the series are:

The United States is enjoying resurgent fossil fuel production. In fact, the International Energy Agency estimates that by 2016, the U.S. will surpass Saudi Arabia and Russia to become the world’s largest oil producer.…

This is the fifth in our series on modern data architectures across industry verticals. Others in the series are:

Consumers have never generated so much data on how they research, discuss and buy products. This new data is valuable for shaping and promoting a brand or product, but it doesn’t line up neatly to fit in pre-defined, tabular formats.…

This is the fourth in our series on modern data architectures across industry verticals. Others in the series are:

We’ve probably all heard the famous quote by John Wanamaker, the father of modern advertising: “Half the money I spend on advertising is wasted; the trouble is, I don’t know which half.”

Wanamaker would love Apache Hadoop for retail applications, because it diminishes (or eliminates) the dilemma he described.…

I recently sat down with Himanshu Bari to discuss how Apache Ambari will serve as the single point of management for Hadoop 2 clusters integrated with Apache Storm and its real-time, streaming event processing.

Himanshu discusses Apache Storm’s five key benefits and how those will add to the power and stability of a Hadoop 2 stack, providing analysis of huge data flows from the second data is created and then for decades of historical analysis of that data stored in HDFS.…