The Hortonworks Blog

Having just returned from our Hadoop Summit Europe event, I was struck by the number of sessions that involved large scale businesses outlining the impact of their advanced analytic applications (built on Hadoop) and how those analytics are empowering better business decisions.

The story of business value is significant. Session after session, representatives from various industries talked about how their modern data architectures with Hadoop led to increased agility, new innovative customer experiences, and lower cost structures.…

On April 30, learn from experts at Hortonworks, Cisco, and Red Hat about accelerating the implementation of a scalable, cost-efficient and robust Big Data solution. Here is a sneak preview of what you’ll hear from our speakers:

  • Ali Bajawa, Senior Partner Solution Engineer, Hortonworks
  • Ron Graham, System Engineer for Big Data Analytics, Cisco
  • Irshad Raihan, Senior Principal, Big Data Product Marketing, Red Hat

Register Now

1. What should a company consider when looking for a big data solution?…

The Apache Hadoop community is happy to announce the release of Apache Hadoop 2.7.0! We want to express our gratitude to every contributor, reviewer and committer.

The Hadoop community fixed 923 JIRAs in total as part of the 2.7.0 release. Of the 923 fixes:

  • 259 were in Hadoop Common
  • 350 were in HDFS
  • 253 were in YARN
  • 61 were in MapReduce

Hadoop 2.7.0 is the first Hadoop release in 2015, following late last year’s 2.6.0.…

Interest in Hadoop as a transformational data platform continues to grow around the world, as more enterprises are building and deploying Hadoop solutions. Hortonworks has been a leader in this regard, as evidenced by the growth of the Hortonworks Data Platform (HDP), with both new and renewing customers worldwide. Customer demand for HDP applications and creative use cases is reaching ever-increasing levels. As such, demand for skilled professional services resources to guide HDP development and deployment represents a tremendous business opportunity for partners.…

Waterline Data is a Hortonworks Technology Partner and recently earned HDP Certification and YARN Ready with their solution that automates the inventory of data assets in the data lake, enables data governance, and provides self-service to data engineers and data scientists to find and understand their data. Learn more by joining the upcoming webinar on May 6, download the Sandbox tutorial or joint whitepaper. Our guest blogger is Oliver Claude, CMO at Waterline Data.…

In this blog, Kevin Petrie (Attunity Senior Director of Marketing) joins me to share thoughts on Hadoop and the Enterprise Data Warehouse.

Some believe that Hadoop and the Enterprise Data Warehouse (EDW) will continue to coexist, side-by-side, solving different use cases. The peanut butter is over here, and the chocolate is over there.

At Hortonworks and Attunity, we see something else. We see how Hortonworks subscribers use Hortonworks Data Platform (HDP) for EDW optimization.…

Can you identify the unused data in your data warehouse? Are you using your “big data” efficiently? Are your data migration projects cost effective? Is your data in compliance with industry regulations? If you answered “no” to any or all of these questions, then you may want to learn more about how to optimize your data warehouse.

On April 23rd at 11:00 am PST, Adis Cesir, Big Data Solution Engineer at Hortonworks, Ramu Kalvakuntla, Principal at RCG Global Services Big Data Practice, and Santosh Chitakki, Director of Product Management at Attunity, will be telling us more about rebalancing data warehouses and integrating your current enterprise data warehouse with a Modern Data Architecture.…

Introduction

Apache Spark is a fast, in-memory data processing engine with elegant and expressive development APIs in Scala, Java, and Python that allow data workers to efficiently execute machine learning algorithms that require fast iterative access to datasets. Spark on Apache Hadoop YARN enables deep integration with Hadoop and other YARN enabled workloads in the enterprise.

In this blog, we will introduce the basic concepts of Apache Spark and the first few necessary steps to get started with Spark on Hortonworks Sandbox.…

Enterprises across all major industries adopt Apache Hadoop for its ability to store and process an abundance of new types of data in a modern data architecture. This “Any Data” capability has always been a hallmark feature of Hadoop, opening insight from new data sources such as clickstream, web and social, geo-location, IoT, server logs, or traditional data sets from ERP, CRM, SCM or other existing data systems.…

Hortonworks is pleased to announce the general availability of Apache Spark in Hortonworks Data Platform (HDP)— now available on our downloads page. With HDP 2.2.4 Hortonworks now offers support for your developers and data scientists using Apache Spark 1.2.1.

HDP’s YARN-based architecture enables multiple applications to share a common cluster and dataset while ensuring consistent levels of service and response. Now Spark is one of the many data access engines that works with YARN and that is supported in an HDP enterprise data lake.…

Hortonworks Data Platform (HDP) provides centralized enterprise services for comprehensive security to enable end-to-end protection, access, compliance and auditing of data in motion and at rest. HDP’s centralized architecture—with Apache Hadoop YARN at its core—also enables consistent operations to enable provisioning, management, monitoring and deployment of Hadoop clusters for a reliable enterprise-ready data lake.

But comprehensive security and consistent operations go together, and neither is possible in isolation.

We published two blogs recently announcing Ambari 2.0 and its new ability to manage rolling upgrades.…

Today we’re delighted to announce our acquisition of SequenceIQ. This acquisition, expected to close in Q2, will accelerate our ability to provide deployment automation for Enterprise Hadoop across public and private clouds. Please join us in welcoming the SequenceIQ team to the Hortonworks family!

Enterprises are embracing Apache Hadoop to enable their modern data architectures and power new analytic applications. The freedom to choose the on-premises or cloud environments for Hadoop that best meets the business needs is a critical requirement.…

Opportunity abounds

According to the enterprise data usage experts at Appfluent, the typical Enterprise Data Warehouse (EDW) dedicates 70% of its storage volume to unused data and 55% of its processing capacity to low value ETL workloads. This represents a waste of what could otherwise be a high performance, finely tuned analytics and reporting environment that supports enterprise priorities. Even worse, EDW environments often cannot deal with the varied structures of new data sources that offer so much untapped value.…

The recent post by Jayush Luniya announced the community release of Apache Ambari 2.0. One of the three key Ambari features that Jayush discussed was Rolling Upgrades, enabling Hadoop operators to upgrade from one version of HDP to the next, with minimal disruption to the cluster.

The Hortonworks development team worked long and hard to make the Hadoop platform “rolling upgradeable”. That groundwork was available in Hortonworks Data Platform 2.2 as described in this previous post.…

This is the third post in a series exploring recent innovations in the Hadoop ecosystem that are included in Hortonworks Data Platform (HDP) 2.2. In this post, we introduce the theme of supporting rolling upgrades and downgrades of a HDFS cluster. See this previous post for an introduction on enterprise-grade rolling upgrades in HDP 2.2.

Hortonworks Data Platform provides centralized enterprise services for consistent operations of Hadoop clusters for a reliable enterprise-ready data lake.…