The Hortonworks Blog

Hortonworks proudly announces the launch of a new education program for Academic Institutions. This program was created to introduce students to the Hortonworks Data Platform (HDP) and to provide them with the necessary technical skills to complement their chosen academic curriculum.

Accredited colleges and universities around the world are invited to apply to become a Hortonworks Academic Partner, allowing them to incorporate our course materials into their classrooms at a low cost to students.…

Hortonworks proudly announces the launch of a new education program for Academic Institutions. This program was created to introduce students to the Hortonworks Data Platform (HDP) and to provide them with the necessary technical skills to complement their chosen academic curriculum.

Accredited colleges and universities around the world are invited to apply to become a Hortonworks Academic Partner, allowing them to incorporate our course materials into their classrooms at a low cost to students.…

The Apache Accumulo community has announced its 1.7.0 release. As community’s first major release of 2015, the release represents the culmination of a year of effort from many Accumulo committers and contributors. Apart from many notable changes enumerated below, Accumulo is now well integrated with Apache Ambari.

In this release, 43 different individuals fixed 691 JIRA issues, and we thank everyone who helped in any way to make this Apache Accumulo 1.7.0 a reality.…

Hadoop really is everywhere. In his recent post, “Going from Hadoop Adoption to Hadoop Everywhere” Shaun Connolly made this point and also quoted Forrester’s Mike Gualtieri:

Hadoop is a must-have for large enterprises

Shaun mentioned these key trends in his post:

  • Hadoop is transforming every industry
  • Enterprises are building applications to make use of all kinds of data
  • Hadoop is ready for the enterprise

Earlier this month, we released Hortonworks’ first quarter earnings.…

In this guest blog, IDC Program Director for Retail Insights Greg Girard shares his insights how retailers employ big data and analytics to drive decision and action across myriad industries. 

Big data and analytics (BDA) have become top agenda items for a growing number of retail executives, and rightly so in the broader social and economic context of data-enabled decision and action. While “data-driven,” as a term, has been around for quite some time, the ability to act on insight has taken on new urgency.…

SQL is the most popular use case for the Hadoop user community, and Apache Hive is still the defacto standard. Early this week, the Apache Hive community released Apache Hive 1.2.0.

Already the third release this year, the Hive developer community continues to improve the release and grow its team, with 11 Hive contributors promoted to committers in the last three months. Dedicated to make Hive enterprise-ready, the community has made improvements in the following areas:

  • Additional SQL functionality
  • Security enhancements
  • Performance gains
  • Stability and usability
  • For the complete list of features, improvements, and bug fixes, see the release notes.…

    Bit Refinery is a Hortonworks Technical Partner and recently certified with HDP. Bit Refinery is a VMware© Cloud Infrastructure-as-a-Service (IaaS) provider featuring virtualization technology hosted within their fully redundant virtual data centers. Bit Refinery offers a hosted Hortonworks Sandbox providing an easy way to experience and learn Hadoop with ease. All the tutorials available from the Hortonworks Sandbox work just as if you were running a localized version of the Sandbox.…

    This is the third post in a series that explores the theme of enabling diverse workloads in YARN.  Our introductory post  to understand the context around all the new features for diverse workloads as part of YARN in HDP 2.2, and a related post on CPU scheduling.

    Introduction

    One of the core responsibilities of YARN is monitoring and limiting resource usage of application containers. When it comes to resource management there are two parts:

  • Resource allocation: Application containers should be allocated on nodes that have the required resources and
  • Enforcement and isolation of Resource usage: Containers should only be allowed to use the resources they get allocated on a NodeManager (NM).
  • TU-Automotive Detroit (formerly Telematics Detroit) is the premier industry show focused on connected car and telematics and Hortonworks is proud to be a Platinum Sponsor of the conference. We hope you can visit us at the show, to learn more about Hadoop for the connected car and infotainment in the vehicle.

    Register for TU-Automotive

    Hortonworks counts some of the world’s premier automakers among its subscribers, and at TU-Automotive Detroit, on Wednesday June 3, Hortonworks President Herb Cunitz will deliver a keynote presentation Leveraging Telematics Data in a Connected World that will discuss some common automotive use cases.…

    Kristen Hardwick, Vice President of Big Data Solutions at Spry, Inc is our guest blogger. In this blog, Kristen shares performance analysis during Spryinc’s evaluation of Apache Hive with Tez as a fast query engine.

    In early 2014, Spry developed a solution that heavily utilized Hive for data transformations. When the project was complete, three distinct data sources were integrated through a series of HiveQL queries using Hive 0.11 on HDP 2.0.…

    With YARN and HDFS at the architectural center, Hadoop has emerged as a key component of any modern data architecture. Today, enterprises utilize Hadoop to store critical datasets and power many of their critical workloads. With this in mind, the services and data within a Hadoop cluster needed to be highly available in face of failures and continue to function while the upgrading to the latest software version.

    With the Hortonworks Data Platform (HDP) 2.2, we have enhanced the core platform packaging to put in place support for rolling upgrades of the HDP stack while the cluster is actively servicing users.…

    This is the fourth post in a series that explores the theme of enabling diverse workloads in YARN. See the introductory post to understand the context around all the new features for diverse workloads as part of YARN in HDP 2.2.

    Introduction

    When it comes to managing resources in YARN, there are two aspects that we, the YARN platform developers, are primarily concerned with:

  • Resource allocation: Application containers should be allocated on the best possible nodes that have the required resources and
  • Enforcement and isolation of Resource usage: On any node, don’t let containers exceed their promised/reserved resource-allocation
  • From its beginning in Hadoop 1, all the way to Hadoop 2 today, the compute platform has always supported memory based allocation and isolation.…

    All segments of the oil and gas industry are adopting Hadoop, from exploration through to drilling, production, transportation, refining, and retail.

    The Hortonworks Oil and Gas team will be demonstrating some of the Hadoop-based advanced analytics applications for the upstream oil and gas industry at PNEC Houston (the International Conference on Petroleum Data Integration, Information, and Data Management) running from May 19-21.

    A Transformation in O&G

    On a daily basis, the geological and geophysical discipline in upstream oil and gas must deal with a significant number of disparate datasets.…

    In this guest blog, Sumeet Kumar Agrawal, principal product manager for Big Data Edition product at Informatica, explains how Informatica’s Big Data Edition integrates with Hortonworks’ security projects, and how you can secure your big data projects.

    Many companies already use big data technology like Hadoop for their production environments, so they can store and analyze petabytes of data including transactional data, weblog data, and social media content to gain better insights about their customers and business.…

    Historically, the strength of a platform lies in the abilities of developers to learn, try, and build against the platform APIs and capabilities. As Apache Hadoop matures as a platform, it’s the creativity and efforts of the developer community that is driving the innovation that makes Hadoop a vibrant and impactful foundation of a modern data architecture.

    A successful developer community leads to a successful platform, and at Hortonworks we are committed to reducing the friction to speed up the success of our customers.…