The Hortonworks Blog

Hadoop Operations for provisioning, managing and monitoring a cluster are critical to the success of a Hadoop project and having an intuitive and effective set of tooling has become a foundational element of a Hadoop distribution. Within HDP, we provide completely open source Apache Ambari to help you be successful with Hadoop operations.

The rate of innovation in the Ambari community is astonishing and this pace continues with the 7th release of the project this year alone, Apache Ambari 1.7.0.…

Data platforms within Enterprises are in midst of a generational shift. After successful reliance on databases for decades, leading organizations today are complementing their data platforms to create a Modern Data Architecture (MDA) with Apache Hadoop in a Data Lake environment. Hadoop with its scale out and schema free architecture enables organizations to store and analyze all its structured and unstructured data in a single consolidated data environment. A key partner in the Hadoop journey has been the complementary infrastructure of server, storage and networking.…

Our customers have many choices of infrastructure to deploy HDP: on premise, cloud, virtualized and even as an appliance. Further, our customers have a choice of deploying on Linux and Windows operating systems. You can easily see this creates a complex matrix. At Hortonworks, we believe you should not be limited to just one option but have the option to choose the best combination of infrastructure and operating system based on the usage scenario.…

We are very pleased to announce that the Hortonworks Data Platform Version 2.2 (HDP) is now generally available for download. With thousands of enhancements across all elements of the platform spanning data access to security to governance, rolling upgrades and more, HDP 2.2 makes it even easier for our customers to incorporate HDP as a core component of Modern Data Architecture (MDA).

HDP 2.2 represents the very latest innovation from across the Hadoop ecosystem, where literally hundreds of developers have been collaborating with us to evolve each of the individual Apache Software Foundation (ASF) projects from the broader Apache Hadoop ecosystem.…

I’m incredibly excited to announce the launch of a combined HP Vertica – Hortonworks Sandbox. Available now, you can download this new, combined Sandbox for free from the HP Vertica Marketplace. All you need to do is signup for a free account.

Once you have an account setup, you can easily navigate to the Hadoop Icon on the left hand side of the page and click through to the Hortonworks Icon.…

It gives me great pleasure to announce that the Apache Hadoop community has released Apache Hadoop 2.6.0 !

In particular, we are excited about three major pieces in this release: heterogeneous storage in HDFS with SSD & Memory tiers, support for long-running services in YARN and rolling upgrades—the ability to upgrade your cluster software and restart upgraded nodes without taking the cluster down or losing work in progress. With YARN as its architectural center, Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it simultaneously in different ways.…

The successful Hadoop journey typically starts with new analytic applications, which lead to a Data Lake. As more and more applications are created that derive value from the new types of data, an architectural shift happens in the data center: companies gain deeper insight across a large, broad, diverse set of data at efficient scale. They create a Data Lake.

Cisco and Hortonworks have partnered to build a highly efficient, highly scalable way to manage all your enterprise data in a data lake.…

With YARN as its architectural center, Apache Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it simultaneously in different ways. Apache Tez supports YARN-based, high performance batch and interactive data processing applications in Hadoop that need to handle datasets scaling to terabytes or petabytes.

The Apache community just released Apache Pig 0.14.0,and the main feature is Pig on Tez.…

While YARN has allowed new engines to emerge for Hadoop, the most popular integration point with Hadoop continues to be SQL and Apache Hive is still the defacto standard. Although many SQL engines for Hadoop have emerged, their differentiation is being rendered obsolete as the open source community surrounds and advances this key engine at an accelerated rate.

Last week, the Apache Hive community released Apache Hive 0.14, which includes the results of the first phase in the Stinger.next initiative and takes Hive beyond its read-only roots and extends it with ACID transactions.…

Hortonworks is pleased to be part of the “going green” movement and even more pleased to introduce guest bloggers from Actian and Slingshot Power. In this blog, Slingshot Power describes their use case on how Hadoop and analytics can influence and increase the adoption of clean energy use.

By Ashish Gupta, CMO & SVP Business Development, Actian

Recently, we announced with Slingshot Power their use of Hortonworks Data Platform (HDP) and the Actian Analytics Platform – Hadoop SQL Edition.…

A Cosmopolitan Metropolis

Brussels, Belgium, conjures images of a cosmopolitan metropolis, where geopolitical summits are held, where world economic forums are debated, where global European institutions are headquartered, and where citizens and diplomats fluently converse in more than three languages—English, French, Dutch or German, along with other non-official local flavors.

To this colorful collage, add the image of a Hadoop Summit Europe 2015 for big data developers, practitioners, industry experts, and entrepreneurs, who make a difference in the digital world, who fluently code in multiple programming languages—Java, Python, Scala, C++, Pig, SQL, or R—and innovate and incubate Apache projects.…

Big data continues to dominate the discussion as businesses both big and small seek to make sense of what exactly it is, and more importantly, what they should do about it. The three biggest challenges associated with big data investments include determining how to get value from data, defining the big data strategy, and obtaining the skills and capabilities needed to make sense of it in a meaningful way.

Join our webinar Thursday Nov.

Two weeks ago Hortonworks presented the third in series of 8 Discover HDP 2.2 webinars: Discover HDP 2.2: Discover HDP 2.2: Apache Falcon for Hadoop Data Governance. Andrew Ahn, Venkatesh Seetharam, and Justin Sears hosted this 3rd webinar in the series.

After Justin Sears set the stage for the webinar by explaining the drivers behind Modern Data Architecture (MDA), Andrew Ahn and Venkatesh Seetharam introduced and discussed how to use Apache Falcon for central management of data lifecycle, business continuity and disaster recovery, and audit and compliance requirement.…

Increasingly, companies around the world are adopting Apache Hadoop as a core component of their Modern Data Architecture (MDA) in order to collect, store, analyze and manipulate massive quantities of data on their own terms—regardless of the source of that data, how old it is, where it is stored, or under what format. Once they build their Modern Data Architecture, what is the best way for them to manage and monitor their Hadoop clusters?…

Introduction

With the rapid adoption of Apache Hadoop, enterprises use machine learning as a key technology to extract tangible business value from their massive data assets. This derivation of business value is possible because Apache Hadoop YARN as the architectural center of Modern Data Architecture (MDA) allows purpose-built data engines such as Apache Tez and Apache Spark to process and iterate over multiple datasets for data science techniques within the same cluster.…