The Hortonworks Blog

Posts categorized by : Innovation from Hortonwoks
YARN and Apache Storm: A Powerful Combination

YARN changed the game for all data access engines in Apache Hadoop. As part of Hadoop 2, YARN took the resource management capabilities that were in MapReduce and packaged them for use by new engines. Now Apache Storm is one of those data-processing engines that can run alongside many others, coordinated by YARN.

YARN’s architecture makes it much easier for users to build and run multiple applications in Hadoop, all sharing a common resource manager.…

The open source community, including Hortonworks, has invested heavily in building enterprise grade security for Apache Hadoop. These efforts include Apache Knox for perimeter security, Kerberos for strong authentication and the recently announced Apache Argus incubator that brings a central administration framework for authorization and auditing.

Join Hortonworks and Voltage Security in a webinar on August 27  to learn more.

In multi-platform environments with data coming from many different sources, personally identifiable information, credit card numbers, and intellectual property can land in the Hadoop cluster.…

This summer, Hortonworks presented the Discover HDP 2.1 Webinar series. Our developers and product managers highlighted the latest innovations in Apache Hadoop and related Apache projects.

We’re grateful to the more than 1,000 attendees whose questions added rich interaction to the pre-planned presentations and demos.

For those of you that missed one of the 30-minute webinars (or those that want to review one they joined live), you can find recordings of all sessions on our What’s New in 2.1 page.…

Zettaset is a Hortonworks partner. In this guest blog, John Armstrong, VP of Marketing at Zettaset Inc., shares Zettaset’s security features and explains why data encryption is vital for data in the Hadoop infrastructure.

Comprehensive Security Across the Hadoop Infrastructure

As big data technologies like Hadoop become widely deployed in production environments, the expectation is that they will meet the enterprise requirements in data governance, operations and security while integrating with existing data center infrastructure. …

The key to monetization of Big Data is not only the ability to capture and process information quickly but also to analyze the data to derive meaningful insights.  Big Data can be complex, and often the expertise of a programmer is needed to create focused and targeted queries.

0xdata, a provider of open source machine learning and predictive analytics for Big Data, helps to facilitate the manipulation and extraction of data with the use of its H2O prediction engine for statisticians. …

The Journey

Almost to the date, two years ago the Apache Hadoop community voted to make YARN a sub-project of Apache Hadoop followed by the GA release nearly a year ago last fall.

Since then, it’s becoming plainly obvious that Apache Hadoop 2.x, powered by YARN as its architectural center, is the best platform for workloads such as Apache Hadoop MapReduce, Apache Pig, Apache Hive etc., which were designed to process data on Apache Hadoop HDFS.…

This week we continue our YARN webinar series with detailed introduction and a developer overview of Apache Tez.  Designed to express fit-to-purpose data processing logic, Tez enables batch and interactive data processing applications spanning TB to PB scale datasets.  Tez offers a customizable execution architecture that allows developers to express complex computations as dataflow graphs and allows for dynamic performance optimizations based on real information about the data and the resources required to process it.…

We are in the midst of a data revolution. Hadoop, powered by Apache Hadoop YARN, enables enterprises to store, process, and innovate around data at a scale never seen before making security a critical consideration. Enterprises are looking for a comprehensive approach to security for their data to realize the full potential of the Hadoop platform unleashed by YARN, the architectural center and the data operating system of Hadoop 2.

Hortonworks and the open community continue to work tirelessly to enhance security in Hadoop.…

ScaleOut joined the Hortonworks Technology Partner Program and has recently achieved Hortonworks Certified status for ScaleOut hServer. ScaleOut Software is a pioneer in in-memory data grid software and the ScaleOut hServer can be installed directly on Hadoop nodes and runs in-memory. In this guest blog, William Bain, Founder and CEO, talks about certification and a use case.

Recently, ScaleOut Software announced technical certification of its ScaleOut hServer® product on Hortonworks Data Platform 2.1.…

This is a guest blog from Protegrity, a Hortonworks certified partner.

As Hadoop transitions to take on a more mission critical role within the data center, so the top IT imperatives of process innovation, operational efficiency, and data security naturally follow. One such imperative in particular now tops the requirement list for Hadoop consideration within the enterprise: a well-developed framework to secure data.

The open source community has responded. Work is underway to build out a comprehensive and coordinated security framework for Hadoop that can work well with existing IT security investments.…

Introduction

HDP 2.1 ships with Apache Knox 0.4.0. This release of Apache Knox supports WebHDFS, WebHCAT, Oozie, Hive, and HBase REST APIs.

Hive is a popular component used for SQL access to Hadoop, and the Hive Server 2 with Thrift supports JDBC access over HTTP. The following steps show the configuration to enable a JDBC client to talk to Hive Server 2 via Knox (Beeline > JDBC over HTTPS > Knox > HTTP > Hive Server2).…

This is a quest blog from Voltage Security, a Hortonworks partner.

Data Security for Hadoop is a critical requirement for adoption within the enterprise. Organizations must protect sensitive customer, partner and internal information and adhere to an ever-increasing set of compliance requirements. The security challenges these organizations are facing are diverse and the technology is evolving rapidly to keep pace. 

An Open Community For Platform Security

The open source community, including Hortonworks, has invested heavily in building enterprise grade security for Apache Hadoop. …

In May, Hortonworks acquired XA Secure and made a promise to contribute this technology to the Apache Software Foundation.  In June, we made it available for all to download and use from our website and today we are proud to announce this technology officially lives on as Apache Argus, an incubator project within the ASF.

This podling has been formed and now the process of graduating Argus to a top-level project (TLP) has begun.…

This is a guest post from Hortonworks partner, Dataguise. Dataguise is a HDP 2.1 certified technology partner providing sensitive data discovery, protection and reporting in Hadoop.

According to a 2013 Global Data Breach study by the Ponemon Institute, the average cost of data loss exceeds $5.4 million per breach, and the average per person cost of lost data approaching $200 per record in the United States. That said, no industry is spared from this threat and all of our data systems, including Hadoop, need to address the security concern.…

Hortonworks Software Engineers Vinod Kumar Vavilapalli (Apache Hadoop YARN committer) and Jian He (Apache YARN Hadoop committer) discuss Apache Hadoop YARN’s Resource Manager resiliency upon restart in this blog.This is their third blog post in our series on motivations and architecture for improvements to the Apache Hadoop YARN’s Resource Manager (RM) resiliency. Others in the series are:

Introduction Phase II – Preserving work-in-progress of running applications

ResourceManager-restart is a critical feature that allows YARN applications to be able to continue functioning even when the ResourceManager (RM) crash-reboots due to various reasons.…

Go to page:12345...10...Last »