From the Dev Team

Follow the latest developments from our technical team

This is the first of two posts examining the use of Hive for interaction with HBase tables. The second post is here.

One of the things I’m frequently asked about is how to use HBase from Apache Hive. Not just how to do it, but what works, how well it works, and how to make good use of it. I’ve done a bit of research in this area, so hopefully this will be useful to someone besides myself.…

This post is authored by Omkar Vinit Joshi with Vinod Kumar Vavilapalli and is the ninth post in the multi-part blog series on Apache Hadoop YARN – a general-purpose, distributed, application management framework that supersedes the classic Apache Hadoop MapReduce framework for processing data in Hadoop clusters. Other posts in this series:

Introduction

In the previous post, we explained the basic concepts of LocalResources and resource localization in YARN.…

This post is the seventh in our series on the motivations, architecture and performance gains of Apache Tez for data processing in Hadoop. The series has the following posts:

In Tez, we recently introduced the support of a feature that we call “Tez Sessions”.…

One of the great things about working in open source development is working with other experts round the work on big projects – and then having the results of that work in the hands of users within a short period of time.

This is why I’m really excited about the Rackspace announcement of their HDP-based Big Data offerings, both “on-prem” and in cloud. Not just because its partners of us offering a service based on Hadoop, but because it shows how Hadoop integration with OpenStack has reached a point where it’s ready for production use.…

The Apache Knox community announced the release of the Apache Knox Gateway (Incubator) 0.3.0. We, at Hortonworks, are excited about this announcement.

The Apache Knox Gateway is a REST API Gateway for Hadoop with a focus on enterprise security integration.  It provides a simple and extensible model for securing access to Hadoop core and ecosystem REST APIs.

Apache Knox provides pluggable authentication to LDAP and trusted identity providers as well as service level authorization and more.  …

With the attention of the Hadoop community on Strata/Hadoop World in New York this week, it’s seems an appropriate time to give everyone an early update on continued community development of Apache Hive. This progress well and truly cements Hive as the standard open-source SQL solution for the Apache Hadoop ecosystem for not just extremely large-scale, batch queries but also for low-latency, human-interactive queries.

You can catch me at our session ‘Apache Hive & Stinger: Petabyte Scale SQL, IN Hadoop’ along with Owen and Alan where we’ll be happy to dive into more of the details.…

I’d like to take a quick moment to welcome Julian Hyde as the latest addition to the Hortonworks engineering team. Julian has a long history of working on data platforms, including development of SQL engines at Oracle, Broadbase, and SQLstream. He was also the architect and primary developer of the Mondrian OLAP engine, part of the Pentaho BI suite.

Julian’s latest role has been as the author and architect of the Optiq project – an Apache licensed open source framework.…

We’re continuing our series of quick interviews with Apache Hadoop project committers at Hortonworks.

This week Mahadev Konar discusses Apache Ambari, the open source Apache project to simplify management of a Hadoop cluster.

Mahadev was on the team at Yahoo! in 2006 that started developing what became Apache Hadoop. Since then, he has also held leadership positions in the Apache Zookeeper and Apache Ambari projects. He is an architect and project management committee member for Apache Ambari, Apache ZooKeeper and Apache Hadoop.…

What a difference a year makes! Last Fall Ambari was a nascent Apache project that had recently shipped an inaugural release in the community. Fast forward a bit, at the beginning of this year Ambari shipped what has become the foundation for rapid innovation. Now Ambari has become a key member of the Apache Hadoop project ecosystem and a trusted operational platform for many companies.

Let’s take a brief look at the community’s amazing accomplishments over the past year, and then take some time to look forward.…

The Hadoop Distributed File System is the reliable and scalable data core of the Hortonworks Data Platform. In HDP 2.0, YARN + HDFS combine to form the distributed operating system for your Data Platform, providing resource management and scalable data storage to the next generation of analytical applications.

Over the past six months, HDFS has introduced a slew of major features to HDFS covering Enterprise Multi-tenancy, Business Continuity Processing and Enterprise Integration:

  • Enabled automated failover with a hot standby and full stack resiliency for the NameNode master service
  • Added enterprise standard NFS read/write access to HDFS
  • Enabled point in time recovery with Snapshots in HDFS
  • Wire Encryption for HDFS Data Transfer Protocol

Looking forward, there are evolving patterns in Data Center infrastructure and Analytical applications that are driving the evolution of HDFS.…

This post is the sixth in our series on the motivations, architecture and performance gains of Apache Tez for data processing in Hadoop. The series has the following posts:

Motivation

Tez follows the traditional Hadoop model of dividing a job into individual tasks, all of which are run as processes via YARN, on the users’ behalf – for isolation, among other reasons.…

The last couple of weeks have been a period of intense activity around the Apache projects that comprise the Hadoop ecosystem. While most of the headlines were accorded to Apache Hadoop 2 going GA, it would be remiss not to pay attention to the great progress being made in the Apache projects that complement Hadoop.

We have blogged about these over the course of the past week and the list below provides a quick summary of the phenomenal work contributed in the open by the folks driving these diverse and vital communities.…

Today we are proud to announce the general availability of Apache Pig 0.12!

If you are a Pig user and you’ve been yearning to use additional languages, for more data validation tools, for more expressions, operators and data types, then read on. Version 0.12 includes all of those additions, and now Pig runs on Windows without Cygwin.

This was a great team effort over the past six months with over 30 engineers from Twitter, Yahoo, LinkedIn, Netflix, Microsoft, IBM, Salesforce, Mortardata, Cloudera and several others (including Hortonworks of course).…

Today we are proud to announce the delivery of Apache Ambari 1.4.1. Ambari 1.4.1 combines many months of work in the community advancing the Ambari codebase. Over 760 JIRAs have been resolved since the Ambari 1.2.5 release. We would like to thank the nearly 40 engineers who contributed to help make this release possible.

Hello Hadoop 2, Meet Apache Ambari The most important addition to Ambari 1.4.1 is support for installing, managing and monitoring a cluster based on the Hadoop 2 stack.…

The Hortonworks HBase team is excited to see HBase 96 released.  It represents a broad community effort and massive amount of work that has been building for more than a year.

HBase 96 closes out over 2000 issues (2134 Jira tickets to be exact) and it represented the collective work from a VERY active community. Kudos to everyone involved! As the authors in a recent Apache blog alluded to, the HBase community is very healthy and includes developers from many companies including Hortonworks, Yahoo!, Cloudera, Salesforce, eBay, Intel, and Facebook, just to name just a few.…

Go to page:« First...678910...Last »