The Hortonworks Blog

Posts categorized by : YARN

The power of a well-crafted speech is indisputable, for words matter—they inspire to act. And so is the power of a well-designed Software Development Kit (SDK), for high-level abstractions and logical constructs in a programming language matter—they simplify to write code.

In 2007, when Chris Wensel, the author of Cascading Java API, was evaluating Hadoop, he had a couple of prescient insights. First, he observed that finding Java developers to write Enterprise Big Data applications in MapReduce will be difficult and convincing developers to write directly to the MapReduce API was a potential blocker.…

The pace of innovation within the Apache Hadoop community is truly remarkable, enabling us to announce the availability of Hortonworks Data Platform 2.1, incorporating the very latest innovations from the Hadoop community in an integrated, tested, and completely open enterprise data platform.

Download HDP 2.1 Technical Preview Now

What’s In Hortonworks Data Platform 2.1?

The advancements in HDP 2.1 span every aspect of Enterprise Hadoop: from data management, data access, integration & governance, security and operations. …

It gives me great pleasure to announce that the Apache Hadoop community has voted to release Apache Hadoop 2.3.0!

hadoop-2.3.0 is the first release for the year 2014, and brings a number of enhancements to the core platform, in particular to HDFS.

With this release, there are two significant enhancements to HDFS:

  • Support for Heterogeneous Storage Hierarchy in HDFS (HDFS-2832)
  • In-memory Cache for data resident in HDFS via Datanodes (HDFS-4949)

With support for heterogeneous storage classes in HDFS, we now can take advantage of different storage types on the same Hadoop clusters.…

Earlier this week Microsoft announced via their blog that a new version of Windows Azure HDInsight is available in public preview.

Microsoft recognizes the importance of the technical innovation in and around YARN as well as Hortonworks leadership in this area and we have worked collaboratively to bring Hadoop 2.2 to Azure via our Hortonworks Data Platform 2.0 for Windows release.

Apache Hadoop YARN is the data operating system for Hadoop and greatly expands the applications possible of this emerging technology by allowing multiple processing frameworks such as streaming or graph processing to plug in natively.…

I recently sat down with Himanshu Bari to discuss how Apache Ambari will serve as the single point of management for Hadoop 2 clusters integrated with Apache Storm and its real-time, streaming event processing.

Himanshu discusses Apache Storm’s five key benefits and how those will add to the power and stability of a Hadoop 2 stack, providing analysis of huge data flows from the second data is created and then for decades of historical analysis of that data stored in HDFS.…

This guest blog post is from Syncsort, a Hortonworks Technology Partner and certified on HDP 2.0, by Keith Kohl, Director, Product Management, Syncsort (@keithkohl)

Several years ago, Syncsort set on a journey to contribute to the Apache Hadoop projects to open and extend Hadoop, and specifically the MapReduce processing framework.  One of the contributions was to open the sort – both map side sort and reduce side – and to make it pluggable. …

Last week was a busy week for shipping code, so here’s a quick recap on the new stuff to keep you busy over the holiday season.

There is a lot of information available on the benefits of Apache YARN but how do you get started building applications? On December 18 at 9am Pacific Time, Hortonworks will host a webinar and go over just that:  what independent software vendors (ISVs) and developers need to do to take the first steps towards developing applications or integrating existing applications on YARN.

Register for the webinar here.

Why YARN?

As Hadoop gains momentum it’s important to recognize the benefits to customers and the competitive advantage software vendors will have if their application is integrated with YARN like elasticity, reliability and efficiency.…

Hortonworks customers can now enhance their Hadoop applications with Elasticsearch real-time data exploration, analytics, logging and search features, all designed to help businesses ask better questions, get clearer answers and better analyze their business metrics in real-time.

Hortonworks Data Platform and Elasticsearch make for a powerful combination of technologies that are extremely useful to anyone handling large volumes of data on a day-to-day basis. With the ability of YARN to support multiple workloads, customers with current investments in flexible batch processing can also add real-time search applications from Elasticsearch.…

User logs of Hadoop jobs serve multiple purposes. First and foremost, they can be used to debug issues while running a MapReduce application – correctness problems with the application itself, race conditions when running on a cluster, and debugging task/job failures due to hardware or platform bugs. Secondly, one can do historical analyses of the logs to see how individual tasks in job/workflow perform over time. One can even analyze the Hadoop MapReduce user-logs using Hadoop MapReduce(!) to determine any performance issues.…

Join Hortonworks and Pactera for a Webinar on Unlocking Big Data’s Potential in Financial Services Thursday, November 21st at 12:00 EST.

Have you ever had your debit or credit card declined for seemingly no reason? Turns out, the rejections are not so random. Banks are increasingly turning to analytics to predict and prevent fraud in real-time. That can sometimes be an inconvenience for customers who are traveling or making large purchases, but it’s necessary inconvenience today in order for banks to reduce billions in losses due to fraud.…

This post is authored by Omkar Vinit Joshi with Vinod Kumar Vavilapalli and is the ninth post in the multi-part blog series on Apache Hadoop YARN – a general-purpose, distributed, application management framework that supersedes the classic Apache Hadoop MapReduce framework for processing data in Hadoop clusters. Other posts in this series:

Introduction

In the previous post, we explained the basic concepts of LocalResources and resource localization in YARN.…

When I first started to understand what YARN is, I wanted to build an application to understand its core. There was already a great example YARN application called Distributed Shell that I could use as the shell (pun intended) for my experiment. Now I just needed an existing application that could provide massive reuse value by other applications. I looked around and I decided on MemcacheD.

This brief guide shows how to get MemcacheD up and running on YARN – MOYA if you will…

Prerequisites

You’re going to need a few things to get the sample application operational.…

With the attention of the Hadoop community on Strata/Hadoop World in New York this week, it’s seems an appropriate time to give everyone an early update on continued community development of Apache Hive. This progress well and truly cements Hive as the standard open-source SQL solution for the Apache Hadoop ecosystem for not just extremely large-scale, batch queries but also for low-latency, human-interactive queries.

You can catch me at our session ‘Apache Hive & Stinger: Petabyte Scale SQL, IN Hadoop’ along with Owen and Alan where we’ll be happy to dive into more of the details.…

The Hadoop Distributed File System is the reliable and scalable data core of the Hortonworks Data Platform. In HDP 2.0, YARN + HDFS combine to form the distributed operating system for your Data Platform, providing resource management and scalable data storage to the next generation of analytical applications.

Over the past six months, HDFS has introduced a slew of major features to HDFS covering Enterprise Multi-tenancy, Business Continuity Processing and Enterprise Integration:

  • Enabled automated failover with a hot standby and full stack resiliency for the NameNode master service
  • Added enterprise standard NFS read/write access to HDFS
  • Enabled point in time recovery with Snapshots in HDFS
  • Wire Encryption for HDFS Data Transfer Protocol

Looking forward, there are evolving patterns in Data Center infrastructure and Analytical applications that are driving the evolution of HDFS.…

Go to page:12345

Thank you for subscribing!