The Hortonworks Blog

Posts categorized by : Apache Hadoop

Securing any system requires you to implement layers of protection.  Access Control Lists (ACLs) are typically applied to data to restrict access to data to approved entities. Application of ACLs at every layer of access for data is critical to secure a system. The layers for hadoop are depicted in this diagram and in this post we will cover the lowest level of access… ACLs for HDFS.

This is part of the HDFS Developer Trail series.  …

Today we are proud to announce that the formation of a terrific partnership with LucidWorks to bring search to the Hortonworks Data Platform. LucidWorks delivers an enterprise-grade search development platform built atop the power of Apache Solr.

Presentation & Applications Enable both existing and new applications to provide value to the organization. Enterprise Management & Security Empower existing operations and security tools to manage Hadoop. Governance Integration Data Workflow, Lifecycle & Governance Data Access Access your data simultaneously in multiple ways (batch, interactive, real-time) Script SQL NoSQL Stream Search In-Mem Others...…

If you’re excited to get started with the new features in Hortonworks Data Platform 2.1, then we’ve included 4 tutorials for you try out – Sandbox-style.

You can download the HDP 2.1 Technical Preview here, and then get stuck into these great tutorials.

Interactive Query with Apache Hive and Apache Tez

OK, so you’re not going to get huge performance out of a one-node VM, but you can try out Hive on Tez, and see the performance gains versus MapReduce, and also try out features such as Vectorized Query, and the host of new SQL features.…

The pace of innovation within the Apache Hadoop community is truly remarkable, enabling us to announce the availability of Hortonworks Data Platform 2.1, incorporating the very latest innovations from the Hadoop community in an integrated, tested, and completely open enterprise data platform.

Download HDP 2.1 Technical Preview Now

What’s In Hortonworks Data Platform 2.1? Presentation & Applications Enable both existing and new applications to provide value to the organization. Enterprise Management & Security Empower existing operations and security tools to manage Hadoop.…

Apache Falcon is a data governance engine that defines, schedules, and monitors data management policies. Falcon allows Hadoop administrators to centrally define their data pipelines, and then Falcon uses those definitions to auto-generate workflows in Apache Oozie.

InMobi is one of the largest Hadoop users in the world, and their team began the project 2 years ago. At the time, InMobi was processing billions of ad-server events in Hadoop every day.…

We are excited to welcome Blackrock and Passport Capital as Hortonworks investors who today led a $100M round of funding with continued participation from all existing investors.

This latest round of funding will allow us to double-down on our founding strategy: to make open source Apache Hadoop a true enterprise data platform. To that end we are focused in two areas:

1. Lead the innovation of Hadoop. In open source, for everyone.…

In February 2014, the Apache Storm community released Storm version 0.9.1. Storm is a distributed, fault-tolerant, and high-performance real-time computation system that provides strong guarantees on the processing of data. Hortonworks is already supporting customers using this important project today.

Many organizations have already used Storm, including our partner Yahoo! This version of Apache Storm (version 0.9.1) is:

  • Highly scalable. Like Hadoop, Storm scales linearly
  • Fault-tolerant. Automatically reassigns tasks if a node fails
  • Reliable. 

LDAP provides a central source for maintaining users and groups within an enterprise. There are two ways to use LDAP groups within Hadoop. The first is to use OS level configuration to read LDAP groups. The second is to explicitly configure Hadoop to use LDAP-based group mapping.

Here is an overview of steps to configure Hadoop explicitly to use groups stored in LDAP.

  • Create Hadoop service accounts in LDAP
  • Shutdown HDFS NameNode & YARN ResourceManager
  • Modify core-site.xml to point to LDAP for group mapping
  • Re-start HDFS NameNode & YARN ResourceManager
  • Verify LDAP based group mapping

Prerequisites: Access to LDAP and the connection details are available.…

Hortonworks would like to congratulate Leslie Lamport on winning the 2013 Turing Award given by the Association of Computing Machinery. This award is essentially the equivalent of the Nobel Prize for computer science.  Among Lamport’s many and varied contributions to the field computer science are: TLA (Temporal Logic for Actions)LaTeX and PAXOS.

The latter of these, the PAXOS three phase consensus protocol, inspires the Zookeeper coordination service, and powers HBase and highly available HDFS.…

Due to the flourish of Apache Software Foundation projects that have emerged in recent years in and around the Apache Hadoop project, a common question I get from mainstream enterprises is: What is the definition of Hadoop?

Download our Whitepaper: Hadoop and a Modern Data Architecture.

This question goes beyond the Apache Hadoop project itself, since most folks know that it’s an open source technology borne out of the experience of web scale consumer companies such as Yahoo!, Facebook and others who were confronted with the need to store and process massive quantities of data.…

This blog post originally appeared here and is reproduced in its entirety here. Part 1 can be found here.

The HBase BlockCache is an important structure for enabling low latency reads. As of HBase 0.96.0, there are no less than three different BlockCache implementations to choose from. But how to know when to use one over the other? There’s a little bit of guidance floating around out there, but nothing concrete.…

The Apache Tez community has voted to release 0.3 of the software.

Apache™ Tez is a replacement of MapReduce that provides a powerful framework for executing a complex topology of tasks. Tez 0.3.0 is an important release towards making the software ready for wider adoption by focussing on fundamentals and ironing out several key functions. The major action areas in this release were

  • Security. Apache Tez now works on secure Hadoop 2.x clusters using the built-in security mechanisms of the Hadoop ecosystem.
  • The Apache Software Foundation (ASF) provides valuable stewardship and guide-rails for projects interested in attracting the broadest community of involvement as possible, especially across a wide range of vendors and end users. While the ASF’s role is not about guaranteeing wild success for every project, they do a great job of providing a place where the broadest community of people, ideas, and code can come together and raise an elephant, so to speak.…

    Elasticsearch’s engine integrates with Hortonworks Data Platform 2.0 and YARN to provide real-time search and access to information in Hadoop.

    See it in action:  register for the Hortonworks and Elasticsearch webinar on March 5th 2014 at 10 am PST/1pm EST to see the demo and an outline for best practices when integrating Elasticsearch and HDP 2.0 to extract maximum insights from your data.  Click here to register for this exciting and informative webinar!…

    It gives me great pleasure to announce that the Apache Hadoop community has voted to release Apache Hadoop 2.3.0!

    hadoop-2.3.0 is the first release for the year 2014, and brings a number of enhancements to the core platform, in particular to HDFS.

    With this release, there are two significant enhancements to HDFS:

    • Support for Heterogeneous Storage Hierarchy in HDFS (HDFS-2832)
    • In-memory Cache for data resident in HDFS via Datanodes (HDFS-4949)

    With support for heterogeneous storage classes in HDFS, we now can take advantage of different storage types on the same Hadoop clusters.…

    Go to page:« First...45678...20...Last »