From the Dev Team

Follow the latest developments from our technical team

Hadoop can be a great complement to existing data warehouse platforms, such as Teradata, as it naturally helps to address two key storage challenges:

The purpose of this article is to detail some of the key integration points and to show how data can be easily exchanged for enrichment between the two platforms.

As a data integrator who is familiar with RDBMS systems and is new to the Hadoop platform, I was looking for a simple way (i.e.…

In this post, we will explore how to quickly and easily spin up our own VM with Vagrant and Apache Ambari. Vagrant is very popular with developers as it lets one mirror the production environment in a VM while staying with all the IDEs and tools in the comfort of the host OS.

If you’re just looking to get started with Hadoop in a VM, then you can simply download the Hortonworks Sandbox.…

This article originally appeared at Opensource.com and is reproduced here.

There are rapidly growing feature set, high commit rates, and code contributions happening across the globe to Apache Hadoop and related Apache Software Foundation projects. However, the number of woman developerscommitters, and Project Management Committee (PMC) members in this vast and diversified ecosystem are really diminutive. For the Hadoop project alone, only 5% out of 84 committers are women; and this has been the case for over the past 2 years.…

I recently sat down with Mahadev Konar and Jeff Sposetti to discuss Apache Ambari v1.4.1. Ambari 1.4.1 is a single framework to provision, manage and monitor clusters based on the Hadoop 2 stack, with YARN and NameNode HA on HDFS.

Mahadev is one of the original architects of Apache Hadoop, a co-founder of Hortonworks, and a committer on Apache Ambari and Apache ZooKeeper. Jeff is the Hortonworks product manager focused on Apache Ambari and Apache Falcon.…

I recently sat down with Himanshu Bari to discuss how Apache Ambari will serve as the single point of management for Hadoop 2 clusters integrated with Apache Storm and its real-time, streaming event processing.

Himanshu discusses Apache Storm’s five key benefits and how those will add to the power and stability of a Hadoop 2 stack, providing analysis of huge data flows from the second data is created and then for decades of historical analysis of that data stored in HDFS.…

I recently sat down with Devaraj Das and Carter Shanklin to discuss the dramatic improvements delivered in Apache HBase version 0.96 included in HDP 2.0.

Now HBase runs on Windows and (whether on Linux or Windows) it recovers from failures much more quickly, with dramatic improvements in mean time to recovery (MTTR).

Devaraj is one of the original architects of Apache Hadoop and Carter is the Hortonworks product manager focused on HBase.…

Installing the Hortonworks Data Platform 2.0 for Windows is straightforward. Lets take a look at how to install a one node cluster on your Windows Server 2012 R2 machine.

To start, download the HDP 2.0 for Windows package. The package is under 1 GB, and will take a few moments to download depending on your internet speed. Documentation for installing a single node instance is located here. This blog post will guide you through that instruction set to get you going with HDP 2.0 for Windows!…

I recently sat down with Owen O’Malley and Carter Shanklin to discuss the dramatic improvements delivered by the Stinger Initiative to version 0.12 of Apache Hive, which is well on its way to being 100x faster than pre-Stinger versions of Hive. That means interactive queries on petabytes of data.

Owen is one of the original architects of Apache Hadoop and Carter is the Hortonworks product manager focused on Hive. Together, they explain the speed, scale and SQL semantics delivered in Apache Hive v0.12, which is included in Hortonworks Data Platform v2.0.…

One aspect of community development of Apache Hadoop is the way that everyone working on Hadoop -full time, part time, vendors, users and even some researchers all collaborate together in the open. This developed is based on publicly accessible project tools: Apache Subversion for revision control, Apache Maven for the builds; Jenkins for automating those builds and tests. Central to a lot of work is the Apache JIRA server, an instance of Atlassian’s issue management tool.…

Whether you were busy finishing up last minute Christmas shopping or just taking time off for the holidays, you might have missed that Hortonworks released the Stinger Phase 3 Technical Preview back in December. The Stinger Initiative is Hortonworks’ open roadmap to making Hive 100x faster while adding standard SQL. Here we’ll discuss 3 great reasons to give Stinger Phase 3 Preview a try to start off the new year.

Reason 1: It’s The Fastest Hive Yet

Whether you want to process more data or lower your time-to-insight, the benefits of a faster Hive speak for themselves.…

Hadoop has traditionally been used for batch processing data at large scale. Batch processing applications care more about raw sequential throughput than low-latency and hence the existing HDFS model where all attached storages are assumed to be spinning disks has worked well.

There is an increasing interest in using Hadoop for interactive query processing e.g. via Hive. Another class of applications makes use of random IO patterns e.g. HBase. Either class of application benefits from lower latency storage media.…

The network and security teams at your company do not allow internet access from the machines where you plan to install Hadoop. What do you do? How do you install your Hadoop cluster without having access to the public software packages? Apache Ambari supports local repositories and in this post we’ll look at the configuration needed for that support.

When installing Hadoop with Ambari, there are three repositories at play: one for Ambari – which primarily hosts the Ambari Server and Ambari Agent packages) and two repositories for the Hortonworks Data Platform – which hosts the HDP Hadoop Stack packages and other related utilities.…

Update! – The final phase of improvements from the Stinger Initiative were released as part of Hive 0.13 on Apr 21, 2014 – Read the announcement

While just a preview by moniker, the release marks a significant milestone in the transformation of Hadoop from a batch-oriented system to a data platform capable of interactive data processing at scale and delivering on the aims of the Stinger Initiative.

Apache Tez and SQL: Interactive Query-IN-Hadoop

Tez is a low-level runtime engine not aimed directly at data analysts or data scientists.…

Encryption is applied to electronic information in order to ensure its privacy and confidentiality.  Typically, we think of protecting data as it rests or in motion.  Wire Encryption protects the latter as data moves through Hadoop over RPC, HTTP, Data Transfer Protocol (DTP), and JDBC.

Let’s cover the configuration required to encrypt each of these protocols. To see the step-by-step instructions please see the HDP 2.0 documentation.

RPC Encryption

The most common way for a client to interact with a Hadoop cluster is through RPC.  …

Last week was a busy week for shipping code, so here’s a quick recap on the new stuff to keep you busy over the holiday season.

Go to page:« First...34567...10...Last »