The Hortonworks Blog

Posts categorized by : Operations & Management
YARN and Apache Storm: A Powerful Combination

YARN changed the game for all data access engines in Apache Hadoop. As part of Hadoop 2, YARN took the resource management capabilities that were in MapReduce and packaged them for use by new engines. Now Apache Storm is one of those data-processing engines that can run alongside many others, coordinated by YARN.

YARN’s architecture makes it much easier for users to build and run multiple applications in Hadoop, all sharing a common resource manager.…

This summer, Hortonworks presented the Discover HDP 2.1 Webinar series. Our developers and product managers highlighted the latest innovations in Apache Hadoop and related Apache projects.

We’re grateful to the more than 1,000 attendees whose questions added rich interaction to the pre-planned presentations and demos.

For those of you that missed one of the 30-minute webinars (or those that want to review one they joined live), you can find recordings of all sessions on our What’s New in 2.1 page.…

This week we continue our YARN webinar series with detailed introduction and a developer overview of Apache Tez.  Designed to express fit-to-purpose data processing logic, Tez enables batch and interactive data processing applications spanning TB to PB scale datasets.  Tez offers a customizable execution architecture that allows developers to express complex computations as dataflow graphs and allows for dynamic performance optimizations based on real information about the data and the resources required to process it.…

Hortonworks Software Engineers Vinod Kumar Vavilapalli (Apache Hadoop YARN committer) and Jian He (Apache YARN Hadoop committer) discuss Apache Hadoop YARN’s Resource Manager resiliency upon restart in this blog.This is their third blog post in our series on motivations and architecture for improvements to the Apache Hadoop YARN’s Resource Manager (RM) resiliency. Others in the series are:

Introduction Phase II – Preserving work-in-progress of running applications

ResourceManager-restart is a critical feature that allows YARN applications to be able to continue functioning even when the ResourceManager (RM) crash-reboots due to various reasons.…

HP and Hortonworks recently announced a strategic partnership that included a $50 million equity investment by HP. While the investment is important, there is an equally important joint commitment to help accelerate the adoption of Enterprise Apache Hadoop by deeply integrating the Hortonworks Data Platform (HDP) with the HP HAVEn big data platform.

Below are some thoughts on our joint work from the HP OMi Team…

The first area of joint engineering strategy between our companies will be to integrate Apache Ambari with HP Operations Manager i (OMi) which provides tools and APIs to provision, manage and monitor Hadoop clusters.  …

“Data is to information society what fuel was to the industrial economy: the critical resource powering the innovations that people rely on,” write Victor Mayer-Schönberger and Kenneth Cukier, in Big Data. Today, big data fuels and engenders innovation of new products and services, according to Forrester.

Just as countries’ fuel repositories need protection and security because they can come under attack, so do companies’ big data repositories. “Companies, markets, and countries are increasingly under attack from cyber-criminals.…

It’s been a busy year for Apache Ambari. Keeping up with the rapid innovation in the open community certainly is exciting. We’ve already seen six releases this year to maintain a steady drumbeat of new features and usability guardrails. We have also seen some exciting announcements of new folks jumping into the Ambari community.

With all these releases and community activities, let’s take a break to talk about how the broader Hadoop community is affecting Ambari and how this is influencing what you will see from Ambari in the future.…

Apache Hadoop has come along a long way. From its early days as a platform to index the web, it has evolved to its current interactive, real-time, and batch processing capabilities spanning gigabytes to petabytes of content. A key stepping stone in this evolution has been Apache Hadoop YARN. YARN has enabled enterprises to onboard “fit for purpose” processing engines to its Hadoop Data Lake. This has opened the Data Lake to rapid and unbridled innovation by the ISV community and delivered differentiated insight to the enterprise.…

SequenceIQ provides an API and platform to build predictive applications and turn data into tangible assets. In this guest blog, SequenceIQ Co-founder and CTO Janos Matyas (@sequenceiq), explains why his team chose Apache Ambari for provisioning Hadoop clusters and how they contributed to the Ambari project.

At SequenceIQ, we frequently provision Hadoop clusters on different environments. For a long time, we searched for the right provisioning and management tool.…

StackIQ, a Hortonworks technology partner, offers a comprehensive software suite that automates the deployment, provisioning, and management of Big Infrastructure. In this guest blog, Anoop Rajendra (@anoop_r), a Senior Software Developer at StackIQ, gives instructions for using StackIQ Cluster Manager to deploy Apache Ambari on a cluster running Hortonworks Data Platform (HDP).

Provisioning, managing and monitoring an Apache™ Hadoop cluster can be challenging. With this in mind, the engineers at Hortonworks introduced the Apache Ambari project into the Apache Software Foundation.…

Apache Hadoop clusters grow and change with use. Maybe you used Apache Ambari to build your initial cluster with a base set of Hadoop services targeting known use cases and now you want to add other services for new use cases. Or you may just need to expand the storage and processing capacity of the cluster.

Ambari can help in both scenarios. In this blog, we’ll cover a few different ways that Ambari can help you expand your cluster.…

Earlier this month, the Apache Ambari community released Apache Ambari 1.6.1, which includes multiple improvements for performance and usability. The momentum in and around the Ambari community is unstoppable. Today we saw the Pivotal team lean in to Ambari, and this is the sixth release of this critical component in 2014, proving again that open source is the fastest path to innovation.

Many thanks to the wealth of contribution from the broad Ambari community that resulted in 585 JIRA issues being resolved in this release.…

There are many projects that have been contributed to the Apache Software Foundation (ASF) by both vendors and users alike that greatly expand Apache Hadoop’s capabilities as an enterprise data platform.

While Hadoop – with YARN at its architectural center – provides the foundational capabilities for managing and accessing data at scale, a broader blueprint for Enterprise Hadoop has emerged that specifies how this array of Apache projects fit across five distinct pillars to form a complete enterprise data platform: data access, data management, security, operations and governance.…

Hadoop is a business-critical data platform at many of the world’s largest enterprises. These corporations require a layered security model focusing on four aspects of security: authentication, authorization, auditing, and data protection. Hortonworks continues to innovate in each of these areas, along with other members of the Apache open source community. In this blog, we will look at the authentication layer and how we can enforce strong authentication in HDP via Kerberos.…

Hadoop Summit Content Curation

Although the Hadoop Summit San Jose 2014 has come and gone, the invaluable content—keynotes, sessions, and tracks—is available here. I’ve selected a few sessions below for Hadoop system administrators and dev-ops, curating them under a general Hadoop operations theme.

Dev-ops engineers and system administrators know best that ease of operations and deployments can make or break a large Hadoop production cluster, which is why they care about all of the following:

  • how rapidly they can create or replicate a cluster;
  • how efficiently they can manage or monitor at scale;
  • how easily and programmatically they can extend or customize their operational scripts; and
  • how accurately they can foresee, forestall, or forecast resource starvation or capacity stipulation.
Go to page:123