From the Dev Team

Follow the latest developments from our technical team

The Apache Knox Gateway team is pleased to announce Knox’s first release as an Apache top-level project: Apache Knox Gateway 0.4.0. The team resolved approximately 100 JIRAs for this release and Knox Gateway is now better positioned to provide complete security for REST API access to a Hadoop cluster.

The new features in Knox Gateway 0.4.0 are the features that enterprise security officers expect in a gateway solution:

  • Perimeter security for a Hadoop cluster
  • Support for enterprise group lookup
  • Audit log of all gateway activity
  • Command line tooling for CMF provisioning
  • Protection for web application vulnerabilities
  • Pre-authentication via SSO token
  • And many more…

As a top-level project, Apache Knox Gateway is fully endorsed by the Apache Software Foundation, and this improves coordination between development of Knox and the other core Hadoop projects with which it interacts.…

Yesterday the Apache Ambari community proudly released version 1.5.1. This is the result of constant, concerted collaboration among the Ambari project’s many members. This release represents the work of over 30 individuals over 5 months and, combined with the Ambari 1.5.0 release, resolves more than 1,000 JIRAs.

This version of Ambari makes huge strides in simplifying the deployment, management and monitoring of large Hadoop clusters, including those running Hortonworks Data Platform 2.1.…

The Apache Hive community has voted on and released version 0.13 today. This is a significant release that represents a major effort from over 70 members who worked diligently to close out over 1080 JIRA tickets.

Hive 0.13 also delivers the third and final phase of the Stinger Initiative, a broad community based initiative to drive the future of Apache Hive, delivering 100x performance improvements at petabyte scale with familiar SQL semantics.…

The power of a well-crafted speech is indisputable, for words matter—they inspire to act. And so is the power of a well-designed Software Development Kit (SDK), for high-level abstractions and logical constructs in a programming language matter—they simplify to write code.

In 2007, when Chris Wensel, the author of Cascading Java API, was evaluating Hadoop, he had a couple of prescient insights. First, he observed that finding Java developers to write Enterprise Big Data applications in MapReduce will be difficult and convincing developers to write directly to the MapReduce API was a potential blocker.…

It gives me great pleasure to announce that the Apache Hadoop community has voted to release Apache Hadoop 2.4.0! Thank you to every single one of the contributors, reviewers and testers!

The community fixed 411 JIRAs for 2.4.0 (on top of the 511 JIRAs resolved for 2.3.0). Of the 411 fixes:

  • 50 are in Hadoop Common,
  • 171 are in HDFS,
  • 160 are in YARN and
  • 30 went into MapReduce

Hadoop 2.4.0 is the second Hadoop release in 2014, following Hadoop 2.3.0’s February release and its key enhancements to HDFS such as Support for Heterogeneous Storage and In-Memory Cache.…

We are excited to announce that the Apache™ Tez community voted to release version 0.4 of the software.

Apache Tez is an alternative to MapReduce that provides a powerful framework for executing a complex topology of tasks for data access in Hadoop. Version 0.4 incorporates the feedback from extensive testing of Tez 0.3, released just last month.

This release is especially meaningful because it coincides with completion of the Stinger Initiative (a collaborative community effort involving 145 developers across 44 companies) and the upcoming release of Apache Hive 0.13.…

Securing any system requires you to implement layers of protection.  Access Control Lists (ACLs) are typically applied to data to restrict access to data to approved entities. Application of ACLs at every layer of access for data is critical to secure a system. The layers for hadoop are depicted in this diagram and in this post we will cover the lowest level of access… ACLs for HDFS.

This is part of the HDFS Developer Trail series.  …

If you’re excited to get started with the new features in Hortonworks Data Platform 2.1, then we’ve included 4 tutorials for you try out – Sandbox-style.

You can download the HDP 2.1 Technical Preview here, and then get stuck into these great tutorials.

Interactive Query with Apache Hive and Apache Tez

OK, so you’re not going to get huge performance out of a one-node VM, but you can try out Hive on Tez, and see the performance gains versus MapReduce, and also try out features such as Vectorized Query, and the host of new SQL features.…

The pace of innovation within the Apache Hadoop community is truly remarkable, enabling us to announce the availability of Hortonworks Data Platform 2.1, incorporating the very latest innovations from the Hadoop community in an integrated, tested, and completely open enterprise data platform.

Download HDP 2.1 Technical Preview Now

What’s In Hortonworks Data Platform 2.1?

The advancements in HDP 2.1 span every aspect of Enterprise Hadoop: from data management, data access, integration & governance, security and operations. …

Apache Falcon is a data governance engine that defines, schedules, and monitors data management policies. Falcon allows Hadoop administrators to centrally define their data pipelines, and then Falcon uses those definitions to auto-generate workflows in Apache Oozie.

InMobi is one of the largest Hadoop users in the world, and their team began the project 2 years ago. At the time, InMobi was processing billions of ad-server events in Hadoop every day.…

In February 2014, the Apache Storm community released Storm version 0.9.1. Storm is a distributed, fault-tolerant, and high-performance real-time computation system that provides strong guarantees on the processing of data. Hortonworks is already supporting customers using this important project today.

Many organizations have already used Storm, including our partner Yahoo! This version of Apache Storm (version 0.9.1) is:

  • Highly scalable. Like Hadoop, Storm scales linearly
  • Fault-tolerant. Automatically reassigns tasks if a node fails
  • Reliable. 

LDAP provides a central source for maintaining users and groups within an enterprise. There are two ways to use LDAP groups within Hadoop. The first is to use OS level configuration to read LDAP groups. The second is to explicitly configure Hadoop to use LDAP-based group mapping.

Here is an overview of steps to configure Hadoop explicitly to use groups stored in LDAP.

  • Create Hadoop service accounts in LDAP
  • Shutdown HDFS NameNode & YARN ResourceManager
  • Modify core-site.xml to point to LDAP for group mapping
  • Re-start HDFS NameNode & YARN ResourceManager
  • Verify LDAP based group mapping

Prerequisites: Access to LDAP and the connection details are available.…

Hortonworks would like to congratulate Leslie Lamport on winning the 2013 Turing Award given by the Association of Computing Machinery. This award is essentially the equivalent of the Nobel Prize for computer science.  Among Lamport’s many and varied contributions to the field computer science are: TLA (Temporal Logic for Actions)LaTeX and PAXOS.

The latter of these, the PAXOS three phase consensus protocol, inspires the Zookeeper coordination service, and powers HBase and highly available HDFS.…

This blog post originally appeared here and is reproduced in its entirety here. Part 1 can be found here.

The HBase BlockCache is an important structure for enabling low latency reads. As of HBase 0.96.0, there are no less than three different BlockCache implementations to choose from. But how to know when to use one over the other? There’s a little bit of guidance floating around out there, but nothing concrete.…

The Apache Tez community has voted to release 0.3 of the software.

Apache™ Tez is a replacement of MapReduce that provides a powerful framework for executing a complex topology of tasks. Tez 0.3.0 is an important release towards making the software ready for wider adoption by focussing on fundamentals and ironing out several key functions. The major action areas in this release were

  • Security. Apache Tez now works on secure Hadoop 2.x clusters using the built-in security mechanisms of the Hadoop ecosystem.…
  • Go to page:12345...10...Last »

    Thank you for subscribing!