The Hortonworks Blog

Posts categorized by : Hadoop

Forrester recently called Apache Hadoop adoption “mandatory” for the enterprise. For most organizations, moving forward with Hadoop is no longer a question of if, but when. Hadoop-powered insight into big data is enabling market disruption in every industry and the market winners are those who handle that data most effectively and at the lowest cost.

As with any new platform, making decisions on how best to implement and for what purpose can be challenging.…

This three part series is co-authored by Ofer Mendelevitch, director of data science at Hortonworks, and Jiwon Seo, Ph.D. and research assistant at Stanford University.

Introduction

This is the second part of our blog-post series about anomaly detection from healthcare data. As described in part 1, our goal is to apply the personalized-PageRank algorithm to detect anomaly in healthcare payment records, specifically the publicly available Medicare-B dataset.

In this blog post, we demonstrate the technical steps to compute the similarity graph between medical providers at scale, using HDP and Apache Pig.…

On March 25th, Josh Lee, Global Director for Insurance Marketing at Informatica and Cindy Maike, General Manager, Insurance at Hortonworks, will be joining the Insurance Journal in a webinar on “How to Become an Analytics-Ready Insurer.”

Register for the Webinar on March 25th at 10am Pacific/1pm Eastern time

Josh and Cindy exchange perspectives on what “analytics ready” really means for insurers, and today we are sharing some of our views (join the webinar to learn more).…

Apache Hive is the de facto standard for SQL in Hadoop with more enterprises relying on this open source project than any other alternative. Stinger.next, a community based effort, is delivering true enterprise SQL at Hadoop scale and speed.

With Hive’s prominence in the enterprise, security within Hive has come under greater focus from enterprise users. They have come to expect fine grain access control and auditing within Hive. Apache Ranger provides centralized security administration for Hadoop, and it enables fine grain access control and deep auditing for Apache components such as Hive, HBase, HDFS, Storm and Knox.…

Paul Boal, Director of Data Management & Analytics at Mercy, is our guest blogger. He shares his thoughts and insights about Apache Hadoop, Hortonworks Data Platform and Mercy’s journey to the Data Lake.

Technology at Mercy

Mercy has long been committed to using technology to improve medical outcomes for patients. We were among the first health care organizations in the U.S. to have a comprehensive, integrated electronic health record (EHR) providing real-time, paperless access to patient information.…

HDP 2.2 brings substantial innovations in Apache Hadoop YARN, enabling users of Apache Hadoop to efficiently store their data in a single repository and interact with it simultaneously using a wide variety of engines. This functionality makes YARN particularly attractive for the integration of many distributed Long-Running services.

In this release, we also introduced a new framework Apache™ Slider for easy on boarding of Long-Running service on top of YARN.…

This three part series is co-authored by Ofer Mendelevitch, director of data science at Hortonworks, and Jiwon Seo, Ph.D. and research assistant at Stanford University.

Introduction

PageRank[1]is the poster-child of graph algorithms, used by Google in its original search engine system to determine which web pages are most influential. The incredible success of PageRank led do increased interest and research in the field of graph algorithms, resulting in innovative extensions such as personalized PageRank [2].…

Cisco and Hortonworks established their official alliance back in 2013. Together, they have been bringing to life the vision of a single big data platform for the enterprise. As every industry is witnessing unprecedented quantities of data and a variety of new data types e.g. clickstream and behavior, machine and sensor, geographic data, server logs, sentiment and web…, Cisco and Hortonworks have been collaborating to empower companies with their data. Oftentimes, organizations need to optimize their IT infrastructure and free up their Enterprise Data Warehouse (EDW) to make the most of all of their data, building new analytic applications and moving towards the vision of the Data Lake.

This is the second post in a series exploring the theme of long-running service workloads in YARN. See for the introductory post.

Long-running services deployed on YARN are by definition expected to run for a long period of time—in many cases forever. Services such as Apache™ HBase, Apache Accumulo and Apache Storm can be run on YARN to provide a layer of services to end users, and they usually have a central master running in conjunction with an ApplicationMaster (AM).…

An increasing number of enterprises are either currently using or are planning to use cloud deployment models to expand their IT infrastructure for big data. We find that organizations are looking for an open and flexible platform that enables them to deploy big data and Hadoop solutions on-premises, in the cloud and in a hybrid environment.

Microsoft and Hortonworks have joined forces to help simplify and ease the transformation of your current Apache Hadoop deployment to hybrid cloud architecture.…

Analysts and data scientists⎯not to mention business executives⎯want Big Data not for the sake of the data itself, but for the ability to work with and learn from that data. As other users become more savvy, they also want more access. But too many inefficient queries can create a bottleneck in the system.

The good news is that Apache™ Hive 0.14—the standard SQL interface for processing, accessing and analyzing Apache Hadoop® data sets—is now powered by Apache Calcite.…

Managing online security for companies is a big task. In a world of increasing cyber threats, the risks to financial organizations are greater than they have ever been. Data breaches result not only in financial loss from data theft and misuse, but in significant reputation damage to the organizations that experience them. How can such organizations quickly and accurately identify risks to protect their data, their assets, and their customers? Threats to your network and vital data sets are constantly evolving to be more sophisticated, which makes them more difficult to detect, especially when you are relying on traditional tools.…

Leading enterprise organizations have concluded that YARN-enabled Hadoop is foundational to their modern data architectures. These companies subscribe with Hortonworks (and implement Hortonworks Data Platform) to bring additional types of data under management, merge those with legacy datasets, and unlock new business insight.

But don’t take our word for it.

Watch these brief videos and hear our customers describe how a data-first approach is transforming their businesses.

Advertising

Luminar is the leading big data analytics and modeling provider uniquely focused on delivering actionable insights on U.S.…

This is the third post in a series exploring recent innovations in the Hadoop ecosystem that are included in Hortonworks Data Platform (HDP) 2.2. In this post, we introduce the theme of supporting rolling upgrades and downgrades of a Hadoop YARN cluster.

HDP 2.2 offers substantial innovations in Apache™ Hadoop YARN, enabling Hadoop users to efficiently store and interact with their data in a single repository, simultaneously using a wide variety of engines.…

Hortonworks provides enterprise Hadoop for the telecommunications service provider, and Hortonworks Data Platform (HDP) is architected from the ground up with the centralized YARN-based architecture and core enterprise services for data governance, security and cluster operations that can revolutionize your telecommunications business.

As the originators of Hadoop, leaders in the developer community, and partners for your success, nobody is better to help you become a data-centric telecommunications enterprise.

Hortonworks supports most of the largest North American carriers.…