Hadoop Insights

News about Hadoop in the wild; how Hadoop is being used; how Hadoop can be used.

Author: Michael Bironneau, Data Scientist, Open Energi

At Open Energi, we think of our service as an automated, virtual power station. Whenever the electric grid experiences sudden, unforeseen surges in supply or demand, assets under the control of our Dynamic Demand algorithm automatically pick up the slack – just like a power station would but cheaper and cleaner.

In order to prove that we’ve delivered this service and keep it running at optimum, we need to analyse large amounts of data relatively quickly.…

Do you like looking for the needle in the field of haystacks? Do I have a job for you; security operations center (SOC) analyst. You will spend your days looking at hundreds of thousands of alerts – created by rules engines – where only a very few a week actually matter.  Your job is to manually review all of them, filtering out the noise to find the few that matter.  Yes, it will take hours to review each one and there won’t be enough time in the day to review them all; but, what can you do?…

A Beginners Guide to Becoming an Apache Contributor

Venkatesh Sellappa, Teradata

My name is Venkatesh Sellappa. My background is primarily application of analytics in the Big Data Space, before either of them was called that. We used to just call it programming. My session is an account of my personal journey into the often contentious and confusing open source world.

Where did it come from and where is it going? What is the economic incentive for people to contribute?…

Recently, Apache Spark set the world of Big Data on fire. With a promise of amazing performance and comfortable APIs, some thought that Spark was bound to replace Hadoop MapReduce. Or is it? Looking closely into it, Spark rather appears to be a natural complement to Apache Hadoop YARN, the architectural center of Hadoop…

Hadoop is already transforming many industries, accelerating Big Data projects to help businesses translate information into competitive advantage.…

Advanced Execution Visualization of Spark jobs Author: Zoltán Zvara, Márton Balassi, András Garzó, Hungarian Academy of Sciences in collaboration with Ericsson

Understanding the physical plan of a big data application is often crucial for tracking down bottlenecks and faulty behavior. Apache Spark although offering useful Web UI component for monitoring and understanding the logical plan of the jobs, lacks a tool that helps to understand the physical plan of the task scheduler and the possibility to monitor execution at a very low level, along with the communication triggered by RDDs and remote block-requests.…

Machine Learning in Big Data – Look Forward or Be Left Behind Bill Porto, Senior Engineering Analyst, RedPoint Global Inc.

Computers? Not so much. One of the biggest developments – and challenges – in technology has been the advent of machine learning. But even as we make major strides in the age of Big Data, applying machine learning to our data is something that few have effectively achieved. Creating models to predict customer response or to segment customer data into set categories are “predictable” use cases.…

Overview of Apache Flink: the 4G of Big Data Analytics Frameworks Author: Slim Baltagi, Director of Big Data engineering, Capital One

I want to thank those of you who voted for my proposal and I look forward to meeting many of you in Dublin. I’ll be around for the conference and would gladly welcome any follow on conversations.

About me

I am currently a Director of Big Data engineering at Capital One.…

It’s our pleasure to host Ryan Peterson, Chief Solution Strategist at EMC, as a guest blogger to expand upon another great step in our partnership to deliver compelling customer solutions through joint engineering efforts.  Follow Ryan @BigDataRyan.

Object storage isn’t a new concept and EMC’s been innovating around it since the beginning. Take our Centera and Atmos products as key examples. The first Centera was created around the idea that objects could store much higher quantities of data than a file system in a single store while the other aspect of Centera was a rich set of security and compliancy features file systems had not been able to achieve.…

The advent of connected manufacturing has ushered in an era where low-cost machine sensors take thousands of measurements per second at many points across the manufacturing process. This stream of sensor data enables manufacturers to quickly detect emerging anomalies and solve issues before they impact yield and quality.

Big Data insights enable predictive analytics for those rapid, proactive process adjustments. Manufacturers can capitalize on this opportunity by following an approach that combines the power of Teradata with Hortonworks Data Platform’s storage and compute efficiencies at extreme scale.…

Recent innovations in the Internet-enabled Connected Cars that we drive today have spawned a whole new set of opportunities and challenges for carmakers. The opportunities come from the ability to capture detailed, current data on how drivers actually operate their cars and how those cars respond to that use.

Register for the October 22 Webinar

That data can be extraordinarily valuable for uses such as preventative maintenance, product development, manufacturing optimization and recall avoidance.…

I recently had the pleasure of visiting with Arvind Battula, Sr. Data Scientist at Schlumberger. We discussed his background as a chemical and mechanical engineer and his move onto the Data and Analytics team as a data scientist. The following is a transcript of my conversation with Arvind. We discussed his background, his interesting focus areas for data science in oil and gas, and technologies that he believes will help transform the industry.…

The journey to data driven business transformation can be confusing and challenging. At Hortonworks, we understand this, and are offering a number of tools that will help companies map out their journey to fully utilize the value of their Big Data.

The journey begins with understanding the opportunities unique to your business, and understanding how the maturity of your organization enables or inhibits your ability to strategically pursue Big Data programs aligned to your business goals.…

There’s excitement in the air as one of Benelux’s largest Big Data conferences “Big Data Expo”, comes to Utrecht in The Netherlands.

We’re sponsoring and you’ll find our experts Chris Harris and Jhon Masschelein presenting such topics as “5 Steps for Effective use of Apache Spark in Hortonworks Data Platform 2.3” and “Lessons Learned: 5 Common Hadoop Use Cases”. You can register here.

As Hortonworks continues to extended its footprint in Europe, we’re seeing  some exciting use cases and an increasing momentum of enterprise adoption of Hadoop.…

In a world that creates 2.5 quintillion bytes of data every year, it is extremely cheap to collect, store and curate all the data you will ever care about. Data is de facto becoming the largest untapped asset. So how can organizations take advantage of unprecedented amounts of data? The answer is new innovations; and new applications. We are clearly entering a new era of modern data application

I would like to take the opportunity to share my Hadoop journey in the past 10 years, and discuss where I see the Hadoop technology going in the next decade.…

Since the partnership between Hortonworks and SAS we have created some awesome assets (i.e., SAS Data Loader sandbox tutorial, educational webinars and array of blogs) that have enabled Hadoop and Big Data enthusiasts’ hands-on training with Apache Hadoop and SAS’ powerful analytics solutions. You can find more details around our partnership and resources here: http://hortonworks.com/partner/sas

To continue the momentum, we have Paul Kent, Vice President of Big Data at SAS, share his insights on the value of  YARN and the benefits it brings to SAS and its users- this time around SAS Grid and YARN. …