The Hortonworks Blog

Apache Storm is the scalable, fault-tolerant realtime distributed processing engine that allows you to handle massive streams of data in realtime, in parallel, and at scale.

Windowing computations is one of the most common use cases in stream processing. Support for windowing computations is a must for deriving actionable insights from real time data streams. So far Apache Storm relied on developers to built their own windowing logic and there were no high level abstractions for developers to define a Window in a standard way in a Storm Topology.…

 

Hadoop All Grown Up

It’s amazing the growth Apache Hadoop and the extended ecosystem has had in the last 10 years. I read through Owen’s “Ten Years of Herding Elephants” blog and downloaded the early docker image of his first patch.  It reminded me of the days it took me to do my first Hadoop install and the effort it was to learn the Java MapReduce basics to understand the infamous WordCount example.  …

This year’s Insurance Canada Technology Conference will focus on the impact of new technologies in the insurance industry. Key topics include telematics, analytics, the Internet of Things (IoT), and how these capabilities enable insurance companies to improve underwriting and reduce risk.

A recent article at Strategy Meets Action identified digital transformation in the insurance industry as a Top 10 Trend Influencing 2016.

Join Cindy Maike, GM of Insurance from Hortonworks, as she discusses “Data-Driven vs.…

Author: Michael Bironneau, Data Scientist, Open Energi

At Open Energi, we think of our service as an automated, virtual power station. Whenever the electric grid experiences sudden, unforeseen surges in supply or demand, assets under the control of our Dynamic Demand algorithm automatically pick up the slack – just like a power station would but cheaper and cleaner.

In order to prove that we’ve delivered this service and keep it running at optimum, we need to analyse large amounts of data relatively quickly.…

It was 10 years ago today (Feb 2) that my first patch (https://issues.apache.org/jira/browse/NUTCH-197) went into the code that two days later became Hadoop (https://issues.apache.org/jira/browse/HADOOP-1).

I had been working on Yahoo Search’s WebMap, which was the back end that analyzed the web for the search engine.  We had been working on a C++ implementation of GFS and MapReduce, but after hiring Doug Cutting decided that it would be easier to get Yahoo’s permission to contribute to code that was already open source rather than open source our C++ project.…

Do you like looking for the needle in the field of haystacks? Do I have a job for you; security operations center (SOC) analyst. You will spend your days looking at hundreds of thousands of alerts – created by rules engines – where only a very few a week actually matter.  Your job is to manually review all of them, filtering out the noise to find the few that matter.  Yes, it will take hours to review each one and there won’t be enough time in the day to review them all; but, what can you do?…

The ConnecteDriver conference, networking and exhibition is currently underway in Brussels, Belgium. Tomorrow, 28 January, Grant Bodley from Hortonworks will be presenting on The Information Superhighway for Automotive Transformation. Following his presentation, Grant will participate in a panel discussion on Connected Car Data.

The abstract for Grant’s presentation is below. You can see the full conference agenda here and register at the ConnecteDriver website.

Abstract:

Big Data, the Internet of Anything (IoAT), and the Connected Car have created a new Information Superhighway that fundamentally changes the relationship between automakers and car buyers.…

A Beginners Guide to Becoming an Apache Contributor

Venkatesh Sellappa, Teradata

My name is Venkatesh Sellappa. My background is primarily application of analytics in the Big Data Space, before either of them was called that. We used to just call it programming. My session is an account of my personal journey into the often contentious and confusing open source world.

Where did it come from and where is it going? What is the economic incentive for people to contribute?…

Recently, Apache Spark set the world of Big Data on fire. With a promise of amazing performance and comfortable APIs, some thought that Spark was bound to replace Hadoop MapReduce. Or is it? Looking closely into it, Spark rather appears to be a natural complement to Apache Hadoop YARN, the architectural center of Hadoop…

Hadoop is already transforming many industries, accelerating Big Data projects to help businesses translate information into competitive advantage.…

Advanced Execution Visualization of Spark jobs Author: Zoltán Zvara, Márton Balassi, András Garzó, Hungarian Academy of Sciences in collaboration with Ericsson

Understanding the physical plan of a big data application is often crucial for tracking down bottlenecks and faulty behavior. Apache Spark although offering useful Web UI component for monitoring and understanding the logical plan of the jobs, lacks a tool that helps to understand the physical plan of the task scheduler and the possibility to monitor execution at a very low level, along with the communication triggered by RDDs and remote block-requests.…

Hello everyone and welcome to the start of my blogging adventure. I’m Mike Schiebel, Cybersecurity Strategist at Hortonworks where I’m focused on cybersecurity to inject enterprise level security features into the Hadoop ecosystem and provide input into the Apache Metron open source project.  I figured introductions are in order, to explain the where and why behind my blog series.

Who am I?

I’ve taken a long and twisting road before ending up at Hortonworks.…

A couple of months ago I joined Hortonworks. There was an undeniable pull to go into the fire of crazy fast innovation and growth. About four seconds in, I realized there was so much more than just the pace of execution and growth but rather a bigger opportunity to be a part of something game-changing. The opportunity to partake in trailblazing the world of data. The opportunity to offer a unique value proposition of truly 100% open technology.…

Today we proudly announced that Arkena, one of Europe’s leading media services companies, is using Hortonworks Data Platform (HDP™) to provide its media customers with an advanced analytics platform to deliver content to OTT customers through its content delivery network (CDN). This is a guest post from Reda Benzair the Vice President of Technical Development at Arkena. You can also join Arkena and Hortonworks February 16th for a live and on-demand webinar about their Advanced Analytics Platform click here.…

We take pride in producing valuable technical blogs and sharing them with a wider audience. Of all the blogs published in 2015 on our website, the following were most popular:

  • Learn how Zeppelin, Spark SQL and MLLib can be combined to simplify exploratory Data Science. 

HDFS is core part of any Hadoop deployment and in order to ensure that data is protected in Hadoop platform, security needs to be baked into the HDFS layer. HDFS is protected using Kerberos authentication, and authorization using POSIX style permissions/HDFS ACLs or using Apache Ranger.

Apache Ranger (http://hortonworks.com/hadoop/ranger/) is a centralized security administration solution  for Hadoop that enables administrators to create and enforce security policies for HDFS and other Hadoop platform components.…