cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button

Hadoop Ecosystem

User Interface and User Experience are some of the most important aspects of developing a product. No matter how many amazing features something has, a user must be able to access them in order to reap the full benefits of the product. For example, in the Apache Ambari Web UI, add-on apps called Views have, […]

Big data is changing the way enterprises interact with and consume data. Modern data platforms, such as Hortonworks Data Platform (HDP) and Hortonworks Data Flow (HDF), are driving a data revolution by powering new workloads and analytic applications. This week, there are thousands of attendees in San Jose at Hadoop Summit 2016 learning about the […]

Water, water everywhere, Nor any drop to drink These lines from “The Rime of the Ancient Mariner,” by Samuel Taylor Coleridge also accurately describe the companies that are trying to transform themselves into a data driven company. These organizations have astronomical volumes of raw data at their disposal but how do they find that proverbial […]

“The world is one big data problem.”  Andrew McAfee, associate director of the Center for Digital Business at MIT Sloan One whole year of almost daily client meetings & discussions with industry leaders have helped me see crystallize my view of an important yet abstract idea into reality.  That is, Big Data capabilities or the lack of […]

With the growing volumes of diverse data being stored in the Data Lake, any breach of this enterprise-wide data can be catastrophic, from privacy violations and regulatory infractions to corporate image and long-term shareholder value. Seshu Adunuthula – Head of Analytics Infrastructure, eBay acting as Track Chair for Governance and Security for Hadoop Summit San Jose has […]

Debugging distributed systems can be difficult largely because they are designed to run on many (possibly thousands) of hosts in a cluster. This process typically involves monitoring and analyzing log files spread across the cluster, and if the necessary information is not being logged, service restarts and job redeployment may be required. Not only is […]

There were a lot of great activities and sessions at the recent Apache: Big Data North America in Vancouver, B.C. I enjoyed the technical level of the sessions and meeting others who contribute to projects in the Apache Software Foundation (ASF). The sessions I went to had a high level of interesting technical content, with […]

The world’s top authorities on Apache Hadoop convene at Hadoop Summit San Jose and one of the top questions that will be answered will be around the future and direction of Hadoop. Sanjay Radia – Founder and Architect, Hortonworks lead the track which selected 13 sessions around this topic. I asked Sanjay what he hoped would […]

At Hadoop Summit San Jose the goal of the Data Science, Analytics and Spark track is sure to be packed. Ram Sriharsha – Product Manager Apache Spark, Databricks generalizes the 16 sessions in the track as providing technical guidance around: Leveraging Hadoop for analytics is a key use case across industries and represents a critical value proposition for Hadoop. This track […]

Hadoop Summit San Jose, is just around the corner. I am amazed at the depth and breadth of the technical sessions and was looking at the Application Development track: Application Development YARN has transformed Hadoop into a multi-tenant data platform. It is the foundation for a wide range of processing engines that empowers businesses to […]

In preparation for Hadoop Summit San Jose, I asked the Chair for the Apache Committer Insights track, Andy Feng – VP Architecture, Yahoo! which were the top 3 sessions he would recommend. Although it was a tough choose only 3, he recommended: HDFS: Optimization, Stabilization and Supportability Speakers: Chris Nauroth from Hortonworks and Arpit Agarwal […]

Welcome back to my blogging adventure.  In my Cybersecurity Architecture series, we’ve spent some time discussing the value an analytic approach to the incident response process. In the last article, Conceptual Cybersecurity Architecture for analytic response, we started to drill into the solution space by giving a high level architecture to drive our discussion.  Let’s […]

A guest blog post from Scott Schlesinger, Principal, Ernst & Young LLP In July 2015, EY announced its EY Warranty Analytics service offering for the SAP HANA® platform. The service includes EY’s advanced analytics for use with SAP® technology to monitor warranty claims, with the goals of identifying fraudulent activity, reducing costs and improving quality. Automobile […]

To compete in the age of IoAT, organizations are tapping into data sources from a network of physical objects to design new customer experiences. The companies that are furthest along are removing operational inefficiencies from their internal processes. They are using self learning algorithms and dynamic model deployment for predictive maintenance to accelerate success in […]

Introduction The community recently announced the release of Apache Storm 1.0.0 Stable. This is a significant release that delivers several features that pertain to enterprise readiness, operational simplicity and ease of use by dramatically enhancing areas around performance, scalability, debug-abilty and manageability. Highlights Here are some of the highlights of features introduced in Storm 1.0 […]