The Hortonworks Blog

This three part series is co-authored by Ofer Mendelevitch, director of data science at Hortonworks, and Jiwon Seo, Ph.D. and research assistant at Stanford University.

Introduction

This is the third part of the blog-post series about anomaly detection from healthcare data.

In part 1, we described the dataset, the business use-case and our general approach of applying graph algorithms (specifically the personalized-PageRank algorithm) to detect anomalies in the Medicare-B dataset.…

As retailers embark on taking advantage of big data, they are increasingly looking to the Apache Hadoop platform and the partner ecosystem that surrounds it to solve their most pressing challenges. Our partner Microsoft is helping organizations around the world in Retail gain better insights into their big data with Azure HDInsight and additional Azure analytic services.

As an example, Azure HDInsights, powered by Hortonworks Data Platform (HDP), was a key component in helping Pier 1 Imports realize value from unstructured and structured data to get a 360 degree view of their customers.…

Apache Ambari is the only 100% open source management and provisioning tool for Apache Hadoop. Recent innovations of Apache Ambari have focused on opening Apache Ambari into a pluggable management platform that can automate cluster provisioning, deploy 3rd party software and provide custom operational and developers’ views to the end user.

Join us Thursday March 26 at 10am PT, for an online technical workshop where we will cover 3 key integration points of Apache Ambari including Stacks, Views and Blueprints and deliver working examples of each.…

At the beginning of February, HP announced their intent to acquire Voltage Security to expand data encryption security solutions for Cloud and Big Data. Today, both companies share their thoughts about the acquisition. Carole Murphy, Director Product Marketing at Voltage Security, and Albert Biketi, Vice President and General Manager at HP Atalla, tell us more about how HP extends the capabilities of every product in the Voltage portfolio, including Voltage’s leadership in securing Hadoop data with data-centric, standards-based technologies.…

Today EMC is launching their EMC® Business Data Lake solution, the first fully-engineered, enterprise-grade solution for a Data Lake running on EMC infrastructure. At Hortonworks, we’ve been assisting customers on their journey to a data lake via a Modern Data Architecture (MDA) and our vision and EMC’s vision are highly complementary and so we’re delighted to be part of the EMC Business Data Lake.

The Data Lake enabled by a Modern Data Architecture allows enterprises to be a Data-First Enterprise.…

We hosted an Apache Slider Meetup at our Hortonworks Santa Clara office on March 4th, where committers, contributors, and community members interested in the Apache Slider congregated to hear what’s happening.

There were two presenters. To set the context for the audience, Steve Loughran, member of technical staff at Hortonworks, delivered an extemporaneous high-level overview of Apache Slider within Apache Hadoop YARN framework.

Running Dockerized Applications on YARN via Slider

Yu “Thomas” Liu gave a demo of his hot-off-the-IDE docker deployment work.…

On Tuesday March 24th at 10am Pacific Time, Duane Lyons, Practice Lead at Clarity Solution Group, will join me Eric Thorsen, Hortonworks General Manager of Retail and Consumer Products to discuss “Consumer720”. During our one-hour webinar, Duane and I will lay out the details of using Hortonworks Data Platform (HDP) to enrich a single view of your customers with social media and clickstream feeds to achieve a new level of consumer engagement.…

Introduction

Today, organizations use the Apache Hadoop™ stack in the form of a central data lake to store their critical datasets and power their analytical processing workloads. A key requirement for the Hadoop cluster and the services running on it is to be highly available and flawlessly continue to function while software is being upgraded. In the past, the Hadoop community has added enterprise features such as High Availability (HA) to various components of the stack, snapshots, improved disaster recovery etc.…

Today, we’re delighted to have a guest blog post from Cameron Peek, who leads Partnership and Strategic Sales  at CSC, one of our Global System Integration and Hortonworks Data Platform Resellers. 

In the spirit of providing our clients innovative, thought proving actionable content for consideration, Computer Sciences Corporation (CSC) is thrilled to present a two part webinar series with our Global partner, Hortonworks. In 2015, we find most of our clients have moved beyond exploring “What is big data?” and “How can I use big data?” and instead are now focused on “How can I ensure I am successful in my big data projects and see results quickly?”

The first webinar in the series will be presented this Thursday, March 19th at 10 am PST (register here) and will help attendees identify actionable next steps in their data analytics projects, no matter where they are today.…

Forrester recently called Apache Hadoop adoption “mandatory” for the enterprise. For most organizations, moving forward with Hadoop is no longer a question of if, but when. Hadoop-powered insight into big data is enabling market disruption in every industry and the market winners are those who handle that data most effectively and at the lowest cost.

As with any new platform, making decisions on how best to implement and for what purpose can be challenging.…

This three part series is co-authored by Ofer Mendelevitch, director of data science at Hortonworks, and Jiwon Seo, Ph.D. and research assistant at Stanford University.

Introduction

This is the second part of our blog-post series about anomaly detection from healthcare data. As described in part 1, our goal is to apply the personalized-PageRank algorithm to detect anomaly in healthcare payment records, specifically the publicly available Medicare-B dataset.

In this blog post, we demonstrate the technical steps to compute the similarity graph between medical providers at scale, using HDP and Apache Pig.…

On March 25th, Josh Lee, Global Director for Insurance Marketing at Informatica and Cindy Maike, General Manager, Insurance at Hortonworks, will be joining the Insurance Journal in a webinar on “How to Become an Analytics-Ready Insurer.”

Register for the Webinar on March 25th at 10am Pacific/1pm Eastern time

Josh and Cindy exchange perspectives on what “analytics ready” really means for insurers, and today we are sharing some of our views (join the webinar to learn more).…

Apache Hive is the de facto standard for SQL in Hadoop with more enterprises relying on this open source project than any other alternative. Stinger.next, a community based effort, is delivering true enterprise SQL at Hadoop scale and speed.

With Hive’s prominence in the enterprise, security within Hive has come under greater focus from enterprise users. They have come to expect fine grain access control and auditing within Hive. Apache Ranger provides centralized security administration for Hadoop, and it enables fine grain access control and deep auditing for Apache components such as Hive, HBase, HDFS, Storm and Knox.…

Paul Boal, Director of Data Management & Analytics at Mercy, is our guest blogger. He shares his thoughts and insights about Apache Hadoop, Hortonworks Data Platform and Mercy’s journey to the Data Lake.

Technology at Mercy

Mercy has long been committed to using technology to improve medical outcomes for patients. We were among the first health care organizations in the U.S. to have a comprehensive, integrated electronic health record (EHR) providing real-time, paperless access to patient information.…

HDP 2.2 brings substantial innovations in Apache Hadoop YARN, enabling users of Apache Hadoop to efficiently store their data in a single repository and interact with it simultaneously using a wide variety of engines. This functionality makes YARN particularly attractive for the integration of many distributed Long-Running services.

In this release, we also introduced a new framework Apache™ Slider for easy on boarding of Long-Running service on top of YARN.…