The Hortonworks Blog

Posts categorized by : HDP

This three part series is co-authored by Ofer Mendelevitch, director of data science at Hortonworks, and Jiwon Seo, Ph.D. and research assistant at Stanford University.

Introduction

This is the third part of the blog-post series about anomaly detection from healthcare data.

In part 1, we described the dataset, the business use-case and our general approach of applying graph algorithms (specifically the personalized-PageRank algorithm) to detect anomalies in the Medicare-B dataset.…

As retailers embark on taking advantage of big data, they are increasingly looking to the Apache Hadoop platform and the partner ecosystem that surrounds it to solve their most pressing challenges. Our partner Microsoft is helping organizations around the world in Retail gain better insights into their big data with Azure HDInsight and additional Azure analytic services.

As an example, Azure HDInsights, powered by Hortonworks Data Platform (HDP), was a key component in helping Pier 1 Imports realize value from unstructured and structured data to get a 360 degree view of their customers.…

Apache Ambari is the only 100% open source management and provisioning tool for Apache Hadoop. Recent innovations of Apache Ambari have focused on opening Apache Ambari into a pluggable management platform that can automate cluster provisioning, deploy 3rd party software and provide custom operational and developers’ views to the end user.

Join us Thursday March 26 at 10am PT, for an online technical workshop where we will cover 3 key integration points of Apache Ambari including Stacks, Views and Blueprints and deliver working examples of each.…

At the beginning of February, HP announced their intent to acquire Voltage Security to expand data encryption security solutions for Cloud and Big Data. Today, both companies share their thoughts about the acquisition. Carole Murphy, Director Product Marketing at Voltage Security, and Albert Biketi, Vice President and General Manager at HP Atalla, tell us more about how HP extends the capabilities of every product in the Voltage portfolio, including Voltage’s leadership in securing Hadoop data with data-centric, standards-based technologies.…

Today EMC is launching their EMC® Business Data Lake solution, the first fully-engineered, enterprise-grade solution for a Data Lake running on EMC infrastructure. At Hortonworks, we’ve been assisting customers on their journey to a data lake via a Modern Data Architecture (MDA) and our vision and EMC’s vision are highly complementary and so we’re delighted to be part of the EMC Business Data Lake.

The Data Lake enabled by a Modern Data Architecture allows enterprises to be a Data-First Enterprise.…

We hosted an Apache Slider Meetup at our Hortonworks Santa Clara office on March 4th, where committers, contributors, and community members interested in the Apache Slider congregated to hear what’s happening.

There were two presenters. To set the context for the audience, Steve Loughran, member of technical staff at Hortonworks, delivered an extemporaneous high-level overview of Apache Slider within Apache Hadoop YARN framework.

Running Dockerized Applications on YARN via Slider

Yu “Thomas” Liu gave a demo of his hot-off-the-IDE docker deployment work.…

On Tuesday March 24th at 10am Pacific Time, Duane Lyons, Practice Lead at Clarity Solution Group, will join me Eric Thorsen, Hortonworks General Manager of Retail and Consumer Products to discuss “Consumer720”. During our one-hour webinar, Duane and I will lay out the details of using Hortonworks Data Platform (HDP) to enrich a single view of your customers with social media and clickstream feeds to achieve a new level of consumer engagement.…

We are excited to announce the general availability of Hortonworks Sandbox on Microsoft Azure. Hortonworks Sandbox is already a very popular environment for Developers, Data Scientists and Administrators to learn and experiment with the latest innovations in Hortonworks Data Platform.

The hundreds of innovations span Hadoop, Kafka, Storm, Hive, Pig, YARN, Ambari, Falcon, Ranger and other components that HDP is comprised of. We also provide tutorials to help you get a jumpstart on how to use HDP to implement a Modern Data Architecture at your organization.…

Introduction

Today, organizations use the Apache Hadoop™ stack in the form of a central data lake to store their critical datasets and power their analytical processing workloads. A key requirement for the Hadoop cluster and the services running on it is to be highly available and flawlessly continue to function while software is being upgraded. In the past, the Hadoop community has added enterprise features such as High Availability (HA) to various components of the stack, snapshots, improved disaster recovery etc.…

Forrester recently called Apache Hadoop adoption “mandatory” for the enterprise. For most organizations, moving forward with Hadoop is no longer a question of if, but when. Hadoop-powered insight into big data is enabling market disruption in every industry and the market winners are those who handle that data most effectively and at the lowest cost.

As with any new platform, making decisions on how best to implement and for what purpose can be challenging.…

This three part series is co-authored by Ofer Mendelevitch, director of data science at Hortonworks, and Jiwon Seo, Ph.D. and research assistant at Stanford University.

Introduction

This is the second part of our blog-post series about anomaly detection from healthcare data. As described in part 1, our goal is to apply the personalized-PageRank algorithm to detect anomaly in healthcare payment records, specifically the publicly available Medicare-B dataset.

In this blog post, we demonstrate the technical steps to compute the similarity graph between medical providers at scale, using HDP and Apache Pig.…

On March 25th, Josh Lee, Global Director for Insurance Marketing at Informatica and Cindy Maike, General Manager, Insurance at Hortonworks, will be joining the Insurance Journal in a webinar on “How to Become an Analytics-Ready Insurer.”

Register for the Webinar on March 25th at 10am Pacific/1pm Eastern time

Josh and Cindy exchange perspectives on what “analytics ready” really means for insurers, and today we are sharing some of our views (join the webinar to learn more).…

Changes in technology and customer expectations create new challenges for how insurers engage their customers, manage risk information and control the rising frequency and severity of claims.

Carriers need to rethink traditional models for customer engagement. Advances in technology and the adoption of retail engagement models drive fundamental changes in how customers shop for and purchase insurance coverage. To engage with their customers, our insurance customers seek “omni-channel” insight and the ability to confidently recommend the next best action (NBA) to their customers.…

Paul Boal, Director of Data Management & Analytics at Mercy, is our guest blogger. He shares his thoughts and insights about Apache Hadoop, Hortonworks Data Platform and Mercy’s journey to the Data Lake.

Technology at Mercy

Mercy has long been committed to using technology to improve medical outcomes for patients. We were among the first health care organizations in the U.S. to have a comprehensive, integrated electronic health record (EHR) providing real-time, paperless access to patient information.…

Cisco and Hortonworks established their official alliance back in 2013. Together, they have been bringing to life the vision of a single big data platform for the enterprise. As every industry is witnessing unprecedented quantities of data and a variety of new data types e.g. clickstream and behavior, machine and sensor, geographic data, server logs, sentiment and web…, Cisco and Hortonworks have been collaborating to empower companies with their data. Oftentimes, organizations need to optimize their IT infrastructure and free up their Enterprise Data Warehouse (EDW) to make the most of all of their data, building new analytic applications and moving towards the vision of the Data Lake.