Security is a top agenda item and represents critical requirements for Hadoop projects. Over the years, Hadoop has evolved to address key concerns regarding authentication, authorization, accounting, and data encryption natively within a cluster and there are many secure Hadoop clusters in production. Hadoop is being used securely and successfully today in sensitive financial services applications, private healthcare initiatives and in a range of other security-sensitive environments. As enterprise adoption of Hadoop grows, so do the security concerns and a roadmap to embrace and incorporate these enterprise security features has emerged.…
From the Dev Team
Follow the latest developments from our technical team
The Apache Tez team is proud to announce the first release of Apache Tez – version 0.2.0-incubating.
Apache Tez is an application framework which allows for a complex directed-acyclic-graph of tasks for processing data and is built atop Apache Hadoop YARN. You can learn much more from our Tez blog series tracked here.
Since entering the Apache Incubator project in late February of 2013, there have been over 400 tickets resolved, culminating in this significant release.…
We are very excited to announce that Apache Ambari has graduated out of Incubator and is now an Apache Top Level Project! Hortonworks introduced Ambari as an Apache Incubator project back in August 2011 with the vision of making Hadoop cluster management dead simple. In little over two years, the development community grew significantly, from a small team in Hortonworks, to a large number of contributors from various organizations beyond Hortonworks; upon graduation, there were more than 60 contributors, 37 of whom had become committers.…
We believe the fastest path to innovation is the open community and we work hard to help deliver this innovation from the community to the enterprise. However, this is a two way street. We are also hearing very distinct requirements being voiced by the broad enterprise as they integrate Hadoop into their data architecture.
Open Source, Open Community & An Open Roadmap for Dataset Management
Over the past year, a set of enterprise requirements has emerged for dataset management. …
A recent survey conducted by the OpenStack foundation shows incredible adoption in the enterprise. Cost savings and operational efficiency stand out as the top business motivators that are driving broad adoption of OpenStack across industry verticals. It was of particular interest to see that roughly 30% of the deployments are in production. Above all, I was definitely not surprised to see Hadoop amongst the top 10 workloads on OpenStack.
Hadoop is the Perfect App for OpenStack
In just a few years, interest in Hadoop has enjoyed a meteoric rise. It is everywhere… and it should be available everywhere.
Here at Hortonworks we have worked to provide the widest range of deployment options for Hadoop… from on-premises to the cloud, Linux and Windows, and from commodity server clusters to high-end appliances. Deployment options are critical to the adoption of Hadoop and a key factor to adoption.
Today, we add Ubuntu to the list of options we support for HDP 2.0.…
User logs of Hadoop jobs serve multiple purposes. First and foremost, they can be used to debug issues while running a MapReduce application – correctness problems with the application itself, race conditions when running on a cluster, and debugging task/job failures due to hardware or platform bugs. Secondly, one can do historical analyses of the logs to see how individual tasks in job/workflow perform over time. One can even analyze the Hadoop MapReduce user-logs using Hadoop MapReduce(!) to determine any performance issues.…
This is the second of two posts examining the use of Hive for interaction with HBase tables. This is a hands-on exploration so the first post isn’t required reading for consuming this one. Still, it might be good context.
“Nick!” you exclaim, “that first post had too many words and I don’t care about JIRA tickets. Show me how I use this thing!”
This is post is exactly that: a concrete, end-to-end example of consuming HBase over Hive.…
This is the first of two posts examining the use of Hive for interaction with HBase tables. The second post is here.
One of the things I’m frequently asked about is how to use HBase from Apache Hive. Not just how to do it, but what works, how well it works, and how to make good use of it. I’ve done a bit of research in this area, so hopefully this will be useful to someone besides myself.…
This post is authored by Omkar Vinit Joshi with Vinod Kumar Vavilapalli and is the ninth post in the multi-part blog series on Apache Hadoop YARN – a general-purpose, distributed, application management framework that supersedes the classic Apache Hadoop MapReduce framework for processing data in Hadoop clusters. Other posts in this series:
- Introducing Apache Hadoop YARN
- Apache Hadoop YARN – Background and an Overview
- Apache Hadoop YARN – Concepts and Applications
- Apache Hadoop YARN – ResourceManager
- Apache Hadoop YARN – NodeManager
- Running existing applications on Hadoop 2 YARN
- Stabilizing YARN APIs for Apache Hadoop 2
- Management of Application Dependencies
- Resource Localization in YARN: Deep Dive
In the previous post, we explained the basic concepts of LocalResources and resource localization in YARN.…
This post is the seventh in our series on the motivations, architecture and performance gains of Apache Tez for data processing in Hadoop. The series has the following posts:
- Apache Tez: A New Chapter in Hadoop Data Processing
- Data Processing API in Apache Tez
- Runtime API in Apache Tez
- Writing a Tez Input/Processor/Output
- Apache Tez: Dynamic Graph Reconfiguration
- Reusing containers in Apache Tez
- Introducing Tez Sessions
In Tez, we recently introduced the support of a feature that we call “Tez Sessions”.…
One of the great things about working in open source development is working with other experts round the work on big projects – and then having the results of that work in the hands of users within a short period of time.
This is why I’m really excited about the Rackspace announcement of their HDP-based Big Data offerings, both “on-prem” and in cloud. Not just because its partners of us offering a service based on Hadoop, but because it shows how Hadoop integration with OpenStack has reached a point where it’s ready for production use.…
The Apache Knox community announced the release of the Apache Knox Gateway (Incubator) 0.3.0. We, at Hortonworks, are excited about this announcement.
The Apache Knox Gateway is a REST API Gateway for Hadoop with a focus on enterprise security integration. It provides a simple and extensible model for securing access to Hadoop core and ecosystem REST APIs.
Apache Knox provides pluggable authentication to LDAP and trusted identity providers as well as service level authorization and more. …
With the attention of the Hadoop community on Strata/Hadoop World in New York this week, it’s seems an appropriate time to give everyone an early update on continued community development of Apache Hive. This progress well and truly cements Hive as the standard open-source SQL solution for the Apache Hadoop ecosystem for not just extremely large-scale, batch queries but also for low-latency, human-interactive queries.
You can catch me at our session ‘Apache Hive & Stinger: Petabyte Scale SQL, IN Hadoop’ along with Owen and Alan where we’ll be happy to dive into more of the details.…
I’d like to take a quick moment to welcome Julian Hyde as the latest addition to the Hortonworks engineering team. Julian has a long history of working on data platforms, including development of SQL engines at Oracle, Broadbase, and SQLstream. He was also the architect and primary developer of the Mondrian OLAP engine, part of the Pentaho BI suite.
Julian’s latest role has been as the author and architect of the Optiq project – an Apache licensed open source framework.…