The Hortonworks Blog

At the beginning of February, HP announced their intent to acquire Voltage Security to expand data encryption security solutions for Cloud and Big Data. Today, both companies share their thoughts about the acquisition. Carole Murphy, Director Product Marketing at Voltage Security, and Albert Biketi, Vice President and General Manager at HP Atalla, tell us more about how HP extends the capabilities of every product in the Voltage portfolio, including Voltage’s leadership in securing Hadoop data with data-centric, standards-based technologies.…

Today EMC is launching their EMC® Business Data Lake solution, the first fully-engineered, enterprise-grade solution for a Data Lake running on EMC infrastructure. At Hortonworks, we’ve been assisting customers on their journey to a data lake via a Modern Data Architecture (MDA) and our vision and EMC’s vision are highly complementary and so we’re delighted to be part of the EMC Business Data Lake.

The Data Lake enabled by a Modern Data Architecture allows enterprises to be a Data-First Enterprise.…

We hosted an Apache Slider Meetup at our Hortonworks Santa Clara office on March 4th, where committers, contributors, and community members interested in the Apache Slider congregated to hear what’s happening.

There were two presenters. To set the context for the audience, Steve Loughran, member of technical staff at Hortonworks, delivered an extemporaneous high-level overview of Apache Slider within Apache Hadoop YARN framework.

Running Dockerized Applications on YARN via Slider

Yu “Thomas” Liu gave a demo of his hot-off-the-IDE docker deployment work.…

On Tuesday March 24th at 10am Pacific Time, Duane Lyons, Practice Lead at Clarity Solution Group, will join me Eric Thorsen, Hortonworks General Manager of Retail and Consumer Products to discuss “Consumer720”. During our one-hour webinar, Duane and I will lay out the details of using Hortonworks Data Platform (HDP) to enrich a single view of your customers with social media and clickstream feeds to achieve a new level of consumer engagement.…

We are excited to announce the general availability of Hortonworks Sandbox on Microsoft Azure. Hortonworks Sandbox is already a very popular environment for Developers, Data Scientists and Administrators to learn and experiment with the latest innovations in Hortonworks Data Platform.

The hundreds of innovations span Hadoop, Kafka, Storm, Hive, Pig, YARN, Ambari, Falcon, Ranger and other components that HDP is comprised of. We also provide tutorials to help you get a jumpstart on how to use HDP to implement a Modern Data Architecture at your organization.…

Introduction

Today, organizations use the Apache Hadoop™ stack in the form of a central data lake to store their critical datasets and power their analytical processing workloads. A key requirement for the Hadoop cluster and the services running on it is to be highly available and flawlessly continue to function while software is being upgraded. In the past, the Hadoop community has added enterprise features such as High Availability (HA) to various components of the stack, snapshots, improved disaster recovery etc.…

Today, we’re delighted to have a guest blog post from Cameron Peek, who leads Partnership and Strategic Sales  at CSC, one of our Global System Integration and Hortonworks Data Platform Resellers. 

In the spirit of providing our clients innovative, thought proving actionable content for consideration, Computer Sciences Corporation (CSC) is thrilled to present a two part webinar series with our Global partner, Hortonworks. In 2015, we find most of our clients have moved beyond exploring “What is big data?” and “How can I use big data?” and instead are now focused on “How can I ensure I am successful in my big data projects and see results quickly?”

The first webinar in the series will be presented this Thursday, March 19th at 10 am PST (register here) and will help attendees identify actionable next steps in their data analytics projects, no matter where they are today.…

Forrester recently called Apache Hadoop adoption “mandatory” for the enterprise. For most organizations, moving forward with Hadoop is no longer a question of if, but when. Hadoop-powered insight into big data is enabling market disruption in every industry and the market winners are those who handle that data most effectively and at the lowest cost.

As with any new platform, making decisions on how best to implement and for what purpose can be challenging.…

This three part series is co-authored by Ofer Mendelevitch, director of data science at Hortonworks, and Jiwon Seo, Ph.D. and research assistant at Stanford University.

Introduction

This is the second part of our blog-post series about anomaly detection from healthcare data. As described in part 1, our goal is to apply the personalized-PageRank algorithm to detect anomaly in healthcare payment records, specifically the publicly available Medicare-B dataset.

In this blog post, we demonstrate the technical steps to compute the similarity graph between medical providers at scale, using HDP and Apache Pig.…

On March 25th, Josh Lee, Global Director for Insurance Marketing at Informatica and Cindy Maike, General Manager, Insurance at Hortonworks, will be joining the Insurance Journal in a webinar on “How to Become an Analytics-Ready Insurer.”

Register for the Webinar on March 25th at 10am Pacific/1pm Eastern time

Josh and Cindy exchange perspectives on what “analytics ready” really means for insurers, and today we are sharing some of our views (join the webinar to learn more).…

Changes in technology and customer expectations create new challenges for how insurers engage their customers, manage risk information and control the rising frequency and severity of claims.

Carriers need to rethink traditional models for customer engagement. Advances in technology and the adoption of retail engagement models drive fundamental changes in how customers shop for and purchase insurance coverage. To engage with their customers, our insurance customers seek “omni-channel” insight and the ability to confidently recommend the next best action (NBA) to their customers.…

Apache Hive is the de facto standard for SQL in Hadoop with more enterprises relying on this open source project than any other alternative. Stinger.next, a community based effort, is delivering true enterprise SQL at Hadoop scale and speed.

With Hive’s prominence in the enterprise, security within Hive has come under greater focus from enterprise users. They have come to expect fine grain access control and auditing within Hive. Apache Ranger provides centralized security administration for Hadoop, and it enables fine grain access control and deep auditing for Apache components such as Hive, HBase, HDFS, Storm and Knox.…

Paul Boal, Director of Data Management & Analytics at Mercy, is our guest blogger. He shares his thoughts and insights about Apache Hadoop, Hortonworks Data Platform and Mercy’s journey to the Data Lake.

Technology at Mercy

Mercy has long been committed to using technology to improve medical outcomes for patients. We were among the first health care organizations in the U.S. to have a comprehensive, integrated electronic health record (EHR) providing real-time, paperless access to patient information.…

“Start with the business problem!” That’s Sanjay’s advice when it comes to building a successful Big Data solution. For those of you who have missed the first part of this video series, Sanjay Krishnamurthi, SVP and Chief Technology Officer at Informatica, and Shaun Connolly, Vice President Corporate Strategy at Hortonworks, address a number of hot Big Data topics throughout a series of nine videos.

Today, they talk about how Big Data projects need to be driven by the business and how IT solutions and frameworks such as Hadoop have to be integrated with the rest of the data systems.…

HDP 2.2 brings substantial innovations in Apache Hadoop YARN, enabling users of Apache Hadoop to efficiently store their data in a single repository and interact with it simultaneously using a wide variety of engines. This functionality makes YARN particularly attractive for the integration of many distributed Long-Running services.

In this release, we also introduced a new framework Apache™ Slider for easy on boarding of Long-Running service on top of YARN.…