The Hortonworks Blog

This is the fourth post in a series that explores the theme of enabling diverse workloads in YARN. See the introductory post to understand the context around all the new features for diverse workloads as part of YARN in HDP 2.2.

Introduction

When it comes to managing resources in YARN, there are two aspects that we, the YARN platform developers, are primarily concerned with:

  • Resource allocation: Application containers should be allocated on the best possible nodes that have the required resources and
  • Enforcement and isolation of Resource usage: On any node, don’t let containers exceed their promised/reserved resource-allocation
  • From its beginning in Hadoop 1, all the way to Hadoop 2 today, the compute platform has always supported memory based allocation and isolation.…

    All segments of the oil and gas industry are adopting Hadoop, from exploration through to drilling, production, transportation, refining, and retail.

    The Hortonworks Oil and Gas team will be demonstrating some of the Hadoop-based advanced analytics applications for the upstream oil and gas industry at PNEC Houston (the International Conference on Petroleum Data Integration, Information, and Data Management) running from May 19-21.

    A Transformation in O&G

    On a daily basis, the geological and geophysical discipline in upstream oil and gas must deal with a significant number of disparate datasets.…

    In this guest blog, Sumeet Kumar Agrawal, principal product manager for Big Data Edition product at Informatica, explains how Informatica’s Big Data Edition integrates with Hortonworks’ security projects, and how you can secure your big data projects.

    Many companies already use big data technology like Hadoop for their production environments, so they can store and analyze petabytes of data including transactional data, weblog data, and social media content to gain better insights about their customers and business.…

    Historically, the strength of a platform lies in the abilities of developers to learn, try, and build against the platform APIs and capabilities. As Apache Hadoop matures as a platform, it’s the creativity and efforts of the developer community that is driving the innovation that makes Hadoop a vibrant and impactful foundation of a modern data architecture.

    A successful developer community leads to a successful platform, and at Hortonworks we are committed to reducing the friction to speed up the success of our customers.…

    With Apache Hadoop YARN as its architectural center, Apache Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it in different ways. As YARN propels Hadoop’s emergence as a business-critical data platform, the enterprise requires more stringent data security capabilities. The Apache Knox Gateway (“Knox”) provides HTTP based access to resources of the Hadoop ecosystem so that enterprises can confidently extend Hadoop access to more users, while maintaining compliance with enterprise security policies.…

    Since the launch of Hortonworks Data Platform (HDP) three years ago, we have seen first hand how Enterprises are embracing Apache Hadoop to enable their modern data architecture’s and power new analytics applications. Hadoop is helping organizations transform their business by providing them with a pervasive, enterprise ready data platform to meet their big data challenges.

    Apache Hadoop’s ability to process any data (i.e., clickstream, web and social, IoT, etc.) allows an Enterprise to derive insights in ways that were previously either technologically or economically not possible. …

    It’s been a busy few weeks here at Hortonworks and much of that busyness comes from all of the things we’ve been doing with our partners. This has been a stretch of time that we’ve affectionately been calling May-magedon with 9 major partner related events in a two and a half week span. We love telling the story of of the transformative nature of Apache Hadoop along with the increasing pervasiveness of enterprise Hadoop driven through a vibrant ecosystem.…

    Next week, in Las Vegas, thousands of attendees will join Informatica World to explore just how far data can take them. Many companies already rely on massive volumes of internal and external data to create new insights and build innovative and profitable business models. Where are you on your journey?

    To learn more about how Hortonworks and Informatica partner to optimize the entire big data supply chain on Hadoop and can help you turn data into actionable information to drive business value, join the following sessions:

    • On Tuesday, May 12, during the Big Data Ready Summit, John Kreisa, VP Strategic Marketing at Hortonworks, will be part of the Succeeding with Big Data and Avoiding the Pitfalls panel.

    Two weeks ago, Apache ORC became an Apache top-level project within the Apache Software Foundation (ASF). This step represents a major step forward for the project, and it is representative of its momentum been built by a broad community of developers.

    What is ORC and why is it useful?

    Back in January 2013, we created ORC files as part of the Stinger initiative to massively speed up Apache Hive and improve the storage efficiency of data stored in Apache Hadoop.…

    The connected and collected vehicle data, emitted through embedded smart sensors, are transforming the automotive industry. Is this hype or reality?

    To discuss the reality of this transformation, to tackle management of streams of data from connected cars, and to share new data architectures that process, manage and analyze volumes of data, automakers and key industry innovators will gather in Berlin for Telematics Berlin 2015 on May 11-12th.

    Data Deluge

    Because legacy architectures have limited capacity to store streams of unstructured and varied data at petabyte scale, lack the ability to analyze data in real-time and offer value and insights, automakers are looking to next generation data platforms.…

    Hortonworks subscribers across all major industries use Hortonworks Data Platform (HDP) to power advanced analytics applications for data discovery and predictive analytics. The insurance industry uses Hadoop to better leverage unstructured information to strengthen subrogation opportunities, stop fraud and minimize claims leakage. This requires new capabilities for data discovery.

    Cindy Maike is the GM for Insurance Solutions at Hortonworks, and next week she will be a panelist at the inaugural Analytics for Insurance Canada event on the usage of analytics in claims at the Analytics for Insurance Canada 2015.…

    Am 22. Mai 2015 veranstalten Hortonworks und die codecentric AG einen kostenfreien Community Day im Rahmen ihrer langjährigen Partnerschaft. Auf der Agenda stehen Erfahrungsberichte aus dem Unternehmensalltag und neueste Entwicklungen vom europäischen Hadoop Summit 2015 – darunter Spark-on-YARN, Apache Zeppelin und die Hadoop-Enterprise-Features zu Security und Data Governance.

    Spannende Big-Data-Projekte von „Atlas“ bis „Zeppelin“

    Anwendervorträge:

    Für einen spannenden Einstieg werden Florian Herrmann und Daniel Schmitt von der Fiducia IT AG sorgen: Sie demonstrieren, wie die Volks- und Raiffeisenbanken eine Lambda-Architektur zur Erkennung von Betrugsversuchen umgesetzt haben.…

    This is the 3rd post in a series that explores the theme of supporting rolling-upgrades & downgrades of a Hadoop YARN cluster. See the introductory post here.

    Background and Motivation

    Before HDP 2.2, Hadoop MapReduce applications depended on MapReduce jars being deployed on all the nodes in a cluster. The java classpath of all the tasks and the ApplicationMaster of a MapReduce job were set to point to the deployed jars.…

    Apache Ambari 2.0 User Views introduce two functional tools to help you understand and optimize your cluster resources to get the best performance in a multitenant Hadoop environment.

    Tez View: Understand and Optimize Jobs in your Cluster

    The Tez View gives you visibility into all the jobs on your cluster, allowing you to quickly identify which jobs consume the most resources and which are the best candidates to optimize.

    With the Tez View you can quickly spot Hive or Pig jobs that are taking the longest, writing the most data or consuming the most CPU.…

    Argyle Data is a Hortonworks Technology Partner and recently certified on the Hortonworks Data Platform (HDP), and was awarded the OPS Ready badge for their integration with Apache Ambari. Here, Dr. Ian Howells talks about how Argyle Data is helping customers detect fraud faster with their native Hadoop application.

    We believe that the world is moving to a new generation of native Apache Hadoop applications. When you build your application from the ground up on Hadoop, it is critical to make it simple for any organization to provision, manage and monitor at scale.…