From the Dev Team

Follow the latest developments from our technical team

Installing the Hortonworks Data Platform 2.0 for Windows is straightforward. Lets take a look at how to install a one node cluster on your Windows Server 2012 R2 machine.

To start, download the HDP 2.0 for Windows package. The package is under 1 GB, and will take a few moments to download depending on your internet speed. Documentation for installing a single node instance is located here. This blog post will guide you through that instruction set to get you going with HDP 2.0 for Windows!…

I recently sat down with Owen O’Malley and Carter Shanklin to discuss the dramatic improvements delivered by the Stinger Initiative to version 0.12 of Apache Hive, which is well on its way to being 100x faster than pre-Stinger versions of Hive. That means interactive queries on petabytes of data.

Owen is one of the original architects of Apache Hadoop and Carter is the Hortonworks product manager focused on Hive. Together, they explain the speed, scale and SQL semantics delivered in Apache Hive v0.12, which is included in Hortonworks Data Platform v2.0.…

One aspect of community development of Apache Hadoop is the way that everyone working on Hadoop -full time, part time, vendors, users and even some researchers all collaborate together in the open. This developed is based on publicly accessible project tools: Apache Subversion for revision control, Apache Maven for the builds; Jenkins for automating those builds and tests. Central to a lot of work is the Apache JIRA server, an instance of Atlassian’s issue management tool.…

Whether you were busy finishing up last minute Christmas shopping or just taking time off for the holidays, you might have missed that Hortonworks released the Stinger Phase 3 Technical Preview back in December. The Stinger Initiative is Hortonworks’ open roadmap to making Hive 100x faster while adding standard SQL. Here we’ll discuss 3 great reasons to give Stinger Phase 3 Preview a try to start off the new year.

Reason 1: It’s The Fastest Hive Yet

Whether you want to process more data or lower your time-to-insight, the benefits of a faster Hive speak for themselves.…

Hadoop has traditionally been used for batch processing data at large scale. Batch processing applications care more about raw sequential throughput than low-latency and hence the existing HDFS model where all attached storages are assumed to be spinning disks has worked well.

There is an increasing interest in using Hadoop for interactive query processing e.g. via Hive. Another class of applications makes use of random IO patterns e.g. HBase. Either class of application benefits from lower latency storage media.…

The network and security teams at your company do not allow internet access from the machines where you plan to install Hadoop. What do you do? How do you install your Hadoop cluster without having access to the public software packages? Apache Ambari supports local repositories and in this post we’ll look at the configuration needed for that support.

When installing Hadoop with Ambari, there are three repositories at play: one for Ambari – which primarily hosts the Ambari Server and Ambari Agent packages) and two repositories for the Hortonworks Data Platform – which hosts the HDP Hadoop Stack packages and other related utilities.…

Update! – The final phase of improvements from the Stinger Initiative were released as part of Hive 0.13 on Apr 21, 2014 – Read the announcement

While just a preview by moniker, the release marks a significant milestone in the transformation of Hadoop from a batch-oriented system to a data platform capable of interactive data processing at scale and delivering on the aims of the Stinger Initiative.

Apache Tez and SQL: Interactive Query-IN-Hadoop

Tez is a low-level runtime engine not aimed directly at data analysts or data scientists.…

Encryption is applied to electronic information in order to ensure its privacy and confidentiality.  Typically, we think of protecting data as it rests or in motion.  Wire Encryption protects the latter as data moves through Hadoop over RPC, HTTP, Data Transfer Protocol (DTP), and JDBC.

Let’s cover the configuration required to encrypt each of these protocols. To see the step-by-step instructions please see the HDP 2.0 documentation.

RPC Encryption

The most common way for a client to interact with a Hadoop cluster is through RPC.  …

Last week was a busy week for shipping code, so here’s a quick recap on the new stuff to keep you busy over the holiday season.

Apache Hadoop has always been very fussy about Java versions. It’s a big application running across tens of thousands of processes across thousands of machines in a single datacenter. This makes it almost inevitable that any race conditions and deadlock bugs in the code will eventually surface – be it in the Java JVM and libraries, in Hadoop itself, or in one of the libraries on which it depends.

Hence the phrase “there are no corner cases in a datacenter”.…

Apache Sqoop is a tool that transfers data between the Hadoop ecosystem and enterprise data stores. Sqoop does this by providing methods to transfer data to HDFS or Hive (using HCatalog). Oracle Database is one of the databases supported by Apache Sqoop. With Oracle Database, the database connection credentials are stored in Oracle Wallet. Oracle Wallet can act as the store of keys and secrets such as authentication credentials. This post describes how Oracle Wallet adds a secure authentication layer for Sqoop jobs.…

Security is a top agenda item and represents critical requirements for Hadoop projects. Over the years, Hadoop has evolved to address key concerns regarding authentication, authorization, accounting, and data protection natively within a cluster and there are many secure Hadoop clusters in production. Hadoop is being used securely and successfully today in sensitive financial services applications, private healthcare initiatives and in a range of other security-sensitive environments. As enterprise adoption of Hadoop grows, so do the security concerns and a roadmap to embrace and incorporate these enterprise security features has emerged.…

The Apache Tez team is proud to announce the first release of Apache Tez – version 0.2.0-incubating.

Apache Tez is an application framework which allows for a complex directed-acyclic-graph of tasks for processing data and is built atop Apache Hadoop YARN. You can learn much more from our Tez blog series tracked here.

Since entering the Apache Incubator project in late February of 2013, there have been over 400 tickets resolved, culminating in this significant release.…

We are very excited to announce that Apache Ambari has graduated out of Incubator and is now an Apache Top Level Project! Hortonworks introduced Ambari as an Apache Incubator project back in August 2011 with the vision of making Hadoop cluster management dead simple.  In little over two years, the development community grew significantly, from a small team in Hortonworks, to a large number of contributors from various organizations beyond Hortonworks; upon graduation, there were more than 60 contributors, 37 of whom had become committers.…

We believe the fastest path to innovation is the open community and we work hard to help deliver this innovation from the community to the enterprise.  However, this is a two way street. We are also hearing very distinct requirements being voiced by the broad enterprise as they integrate Hadoop into their data architecture.

Take a look at the Falcon Technical Preview and the Data Management Labs.

Open Source, Open Community & An Open Roadmap for Dataset Management

Over the past year, a set of enterprise requirements has emerged for dataset management.  …

Go to page:« First...34567...10...Last »