Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
November 17, 2017 | Piet Loubser | Hortonworks Case Study

Building a global data lake for International Banking

November 13, 2017 | Matt Spillar | Hortonworks Case Study

How Nissan is Harnessing Big Data to Provide Value to Customers

November 10, 2017 | Syed Mahmood | Announcements

Certification of IBM Data Science Experience (DSX) on HDP is a Win-Win for Customers

Viewing posts by: Mark Harring« Back to all

X
FILTERS
ALL
TECHNICAL
BUSINESS

All Topics















All Channels











CLEAR FILTERS

The world’s top authorities on Apache Hadoop convene at Hadoop Summit San Jose and one of the top questions that will be answered will be around the future and direction of Hadoop. Sanjay Radia – Founder and Architect, Hortonworks lead the track which selected 13 sessions around this topic. I asked Sanjay what he hoped would […]

Since the partnership between Hortonworks and SAS we have created some awesome assets (i.e., SAS Data Loader sandbox tutorial, educational webinars and array of blogs) that have enabled Hadoop and Big Data enthusiasts’ hands-on training with Apache Hadoop and SAS’ powerful analytics solutions. You can find more details around our partnership and resources here: https://hortonworks.com/partner/sas To continue […]

Yahoo! JAPAN needed a data platform that could scale to generate 100,000 reports per day as well as having the ability to process large amounts of data. It needed to keep the last 13 months’ worth of data, which is approximately 500 billion rows, organized and easily accessible. Relational Database Management Systems (RDBMS) cannot scale […]

Symantec helps consumers and organizations secure and manage their information-driven world by protecting digital information and online transactions. The Symantec Cloud Platform team turned to Hortonworks to ingest an enormous volume of security logs, analyze that security metadata and then use that insight to protect its customers. Symantec now analyzes threat data much more quickly […]

In this Hortonworks’ partner guest blog, Jorik Blaas, chief technical officer at SynerScope, explores a use case in a new class of exploratory analytics, using Apache Spark on YARN, HDP and SynerScope. Preliminaries SynerScope is a pioneering developer of fast, sense-making Big Data Analytics technology. Focusing on human-in-the-loop analytics, we excel at combining heterogeneous data […]

In this Hortonworks’ partner guest blog, Abhimanyu Aditya, Senior Product Manager and co-founder at Skytree, explains how Skytree APIs solve challenges facing data engineers, simplifies data preparation and data transformation, using Apache Spark on YARN with Hortonworks Data Platform (HDP). Challenges Facing Data Engineers and Data Scientists Machine learning as a technology can be challenging. […]

Last week, on July 22nd, we announced the general availability of HDP 2.3. Of the three part blog series, the first blog summarized the key innovations in the release—ease of use & enterprise readiness and how those are helping deliver transformational outcomes—while the second blog focused on data access innovation. In this final part, we […]

Mayank Bansal, of EBay, is a guest contributing author of this collaborative blog. This is the 4th post in a series that explores the theme of enabling diverse workloads in YARN. See the introductory post to understand the context around all the new features for diverse workloads as part of Apache Hadoop YARN in HDP. Background  In Hadoop YARN’s […]

Introduction Multihoming is the practice of connecting a host to more than a single network. This is frequently used to provide network-level fault tolerance – if hosts are able to communicate on more than one network, the failure of one network will not render the hosts inaccessible. There are other use cases for multi-homing as […]

Not a day passes without someone tweeting or re-tweeting a blog on the virtues of Apache Spark. At a Memorial Day BBQ, an old friend proclaimed: “Spark is the new rub, just as Java was two decades ago. It’s a developers’ delight.” Spark as a distributed data processing and computing platform offers much of what […]

This is the third post in a series that explores the theme of enabling diverse workloads in YARN.  Our introductory post  to understand the context around all the new features for diverse workloads as part of YARN in HDP 2.2, and a related post on CPU scheduling. Introduction One of the core responsibilities of YARN is monitoring and […]

This is the fourth post in a series that explores the theme of enabling diverse workloads in YARN. See the introductory post to understand the context around all the new features for diverse workloads as part of YARN in HDP 2.2. Introduction When it comes to managing resources in YARN, there are two aspects that we, […]

Historically, the strength of a platform lies in the abilities of developers to learn, try, and build against the platform APIs and capabilities. As Apache Hadoop matures as a platform, it’s the creativity and efforts of the developer community that is driving the innovation that makes Hadoop a vibrant and impactful foundation of a modern […]

This is the 3rd post in a series that explores the theme of supporting rolling-upgrades & downgrades of a Hadoop YARN cluster. See the introductory post here. Background and Motivation Before HDP 2.2, Hadoop MapReduce applications depended on MapReduce jars being deployed on all the nodes in a cluster. The java classpath of all the […]

Apache Ambari 2.0 User Views introduce two functional tools to help you understand and optimize your cluster resources to get the best performance in a multitenant Hadoop environment. Tez View: Understand and Optimize Jobs in your Cluster The Tez View gives you visibility into all the jobs on your cluster, allowing you to quickly identify […]