The Hortonworks Blog

This year’s Insurance Analytics USA Summit has an exciting new format with presentations and panels that focus on using data to its full potential, creating a data-conscious culture, and applying innovative modeling techniques.

Sessions include “The Future of Insurance: Using Analytics to Take Advantage of the Data-Driven Age of Insurance” and “New and Big Data: Maximizing the Explosion of External Data”. Additionally, the conference will feature interactive roundtable sessions.

A recent research article by Strategy Meets Action identified Analytics in the insurance industry as a Top Strategic Initiative for 2016.…

Big Data and Apache™ Hadoop® are driving tectonic shifts in enterprise data management (EDM) within the financial services industry. Open Enterprise Hadoop and the vendor ecosystem growing up around it are consolidating and standardizing data architectures at the country’s largest banks—transforming expensive, inflexible, and proprietary data landscapes into economic, agile, open source data environments.

Regulatory Pressures Force Architectural Renovation

Banks are accustomed to investing in data solutions just to “keep the lights on.” As data volumes and variety increase, they pour money into legacy platforms, without a commensurate improvement in functionality.…

People have been asking us – Is Google Cloud Dataflow the same thing as Hortonworks DataFlow (HDF)? So we thought we’d take the opportunity to share with you how we see these two products work together. Both have the word dataflow in their name, and both systems are rooted in the premise of dataflow programming, but beyond that there are significant differences.

Google Cloud Dataflow provides an abstraction to systems processing and analyzing data streams, such as MapReduce

Google Cloud Dataflow provides an abstraction layer to systems processing and analyzing data streams such as MapReduce, and is designed strictly for the Google Compute Cloud (ie a virtual data center).…

We are already more than a month into 2016 and it’s anything but business as usual in Oil and Gas. Current markets are making companies rethink every aspect of their business model, foundational cost structure, and strategy for delivering value to customers and shareholders.

The same thorough scrutiny should be applied to traditional enterprise software tools and platforms. In fact, open source innovations in enterprise software promise dramatic cost optimization opportunities and also changes to the ways that traditional O&G domain challenges are approached.…

Apache Storm is the scalable, fault-tolerant realtime distributed processing engine that allows you to handle massive streams of data in realtime, in parallel, and at scale.

Windowing computations is one of the most common use cases in stream processing. Support for windowing computations is a must for deriving actionable insights from real time data streams. So far Apache Storm relied on developers to built their own windowing logic and there were no high level abstractions for developers to define a Window in a standard way in a Storm Topology.…

 

Hadoop All Grown Up

It’s amazing the growth Apache Hadoop and the extended ecosystem has had in the last 10 years. I read through Owen’s “Ten Years of Herding Elephants” blog and downloaded the early docker image of his first patch.  It reminded me of the days it took me to do my first Hadoop install and the effort it was to learn the Java MapReduce basics to understand the infamous WordCount example.  …

This year’s Insurance Canada Technology Conference will focus on the impact of new technologies in the insurance industry. Key topics include telematics, analytics, the Internet of Things (IoT), and how these capabilities enable insurance companies to improve underwriting and reduce risk.

A recent article at Strategy Meets Action identified digital transformation in the insurance industry as a Top 10 Trend Influencing 2016.

Join Cindy Maike, GM of Insurance from Hortonworks, as she discusses “Data-Driven vs.…

Author: Michael Bironneau, Data Scientist, Open Energi

At Open Energi, we think of our service as an automated, virtual power station. Whenever the electric grid experiences sudden, unforeseen surges in supply or demand, assets under the control of our Dynamic Demand algorithm automatically pick up the slack – just like a power station would but cheaper and cleaner.

In order to prove that we’ve delivered this service and keep it running at optimum, we need to analyse large amounts of data relatively quickly.…

It was 10 years ago today (Feb 2) that my first patch (https://issues.apache.org/jira/browse/NUTCH-197) went into the code that two days later became Hadoop (https://issues.apache.org/jira/browse/HADOOP-1).

I had been working on Yahoo Search’s WebMap, which was the back end that analyzed the web for the search engine.  We had been working on a C++ implementation of GFS and MapReduce, but after hiring Doug Cutting decided that it would be easier to get Yahoo’s permission to contribute to code that was already open source rather than open source our C++ project.…

Do you like looking for the needle in the field of haystacks? Do I have a job for you; security operations center (SOC) analyst. You will spend your days looking at hundreds of thousands of alerts – created by rules engines – where only a very few a week actually matter.  Your job is to manually review all of them, filtering out the noise to find the few that matter.  Yes, it will take hours to review each one and there won’t be enough time in the day to review them all; but, what can you do?…

The ConnecteDriver conference, networking and exhibition is currently underway in Brussels, Belgium. Tomorrow, 28 January, Grant Bodley from Hortonworks will be presenting on The Information Superhighway for Automotive Transformation. Following his presentation, Grant will participate in a panel discussion on Connected Car Data.

The abstract for Grant’s presentation is below. You can see the full conference agenda here and register at the ConnecteDriver website.

Abstract:

Big Data, the Internet of Anything (IoAT), and the Connected Car have created a new Information Superhighway that fundamentally changes the relationship between automakers and car buyers.…

Increasingly, financial services firms manage global operations across multiple countries and continents. The natural consequences of their international expansions make compliance with banking regulations more challenging.

With the proliferation of diverse financial products tailored to local markets and tighter integration between commercial and investment banking operations, compliance teams need to investigate new and different relationships in order to identify suspicious financial activities and report them to authorities.

Regulations such as the Bank Secrecy Act, the USA PATRIOT Act and the Foreign Account Tax Compliance Act require United States banks, insurance companies and capital markets firms to file Suspicious Activity Reports (SARs) if they suspect that transactions are laundering funds to support fraud, terrorist financing or other crimes.…

A Beginners Guide to Becoming an Apache Contributor

Venkatesh Sellappa, Teradata

My name is Venkatesh Sellappa. My background is primarily application of analytics in the Big Data Space, before either of them was called that. We used to just call it programming. My session is an account of my personal journey into the often contentious and confusing open source world.

Where did it come from and where is it going? What is the economic incentive for people to contribute?…

Recently, Apache Spark set the world of Big Data on fire. With a promise of amazing performance and comfortable APIs, some thought that Spark was bound to replace Hadoop MapReduce. Or is it? Looking closely into it, Spark rather appears to be a natural complement to Apache Hadoop YARN, the architectural center of Hadoop…

Hadoop is already transforming many industries, accelerating Big Data projects to help businesses translate information into competitive advantage.…

Advanced Execution Visualization of Spark jobs Author: Zoltán Zvara, Márton Balassi, András Garzó, Hungarian Academy of Sciences in collaboration with Ericsson

Understanding the physical plan of a big data application is often crucial for tracking down bottlenecks and faulty behavior. Apache Spark although offering useful Web UI component for monitoring and understanding the logical plan of the jobs, lacks a tool that helps to understand the physical plan of the task scheduler and the possibility to monitor execution at a very low level, along with the communication triggered by RDDs and remote block-requests.…