Three weeks ago, we announced availability of the technical preview of Hortonworks Data Platform (HDP) version 2.1 and since then we have had thousands of downloads of this preview. We also promised delivery of GA bits on April 22nd and we are delighted to deliver as stated. HDP 2.1, which includes countless new features across seven new components, is available today from our download page.
The Hortonworks Blog
The Apache Hive community has voted on and released version 0.13 today. This is a significant release that represents a major effort from over 70 members who worked diligently to close out over 1080 JIRA tickets.
Hive 0.13 also delivers the third and final phase of the Stinger Initiative, a broad community based initiative to drive the future of Apache Hive, delivering 100x performance improvements at petabyte scale with familiar SQL semantics.…
The power of a well-crafted speech is indisputable, for words matter—they inspire to act. And so is the power of a well-designed Software Development Kit (SDK), for high-level abstractions and logical constructs in a programming language matter—they simplify to write code.
In 2007, when Chris Wensel, the author of Cascading Java API, was evaluating Hadoop, he had a couple of prescient insights. First, he observed that finding Java developers to write Enterprise Big Data applications in MapReduce will be difficult and convincing developers to write directly to the MapReduce API was a potential blocker.…
As enterprises build new applications with the data they cost effectively capture and process with Apache Hadoop it is important for the platform to facilitate the app dev processes. That’s why we are excited to announce that we’ve expanded our partnership with Concurrent, Inc. to simplify and accelerate application development on Hadoop.
There are two components to this expanded partnership.
Securing any system requires you to implement layers of protection. Access Control Lists (ACLs) are typically applied to data to restrict access to data to approved entities. Application of ACLs at every layer of access for data is critical to secure a system. The layers for hadoop are depicted in this diagram and in this post we will cover the lowest level of access… ACLs for HDFS.
This is part of the HDFS Developer Trail series. …
Yesterday our partner Teradata announced a new capability called Teradata QueryGrid that further deepens the integration between the Teradata Data Warehouse and the Hortonworks Data Platform. This announcement is important because it delivers on the promise and the value of the Modern Data Architecture by demonstrating how the two technologies complement each other for the enterprise.
Teradata pioneered deeper integration with Apache Hadoop through integration with H-Catalog initially with Aster SQL-H and then the Data Warehouse and now they have taken it to the next level with Teradata QueryGrid.…
If you’re excited to get started with the new features in Hortonworks Data Platform 2.1, then we’ve included 4 tutorials for you try out – Sandbox-style.
You can download the HDP 2.1 Technical Preview here, and then get stuck into these great tutorials.Interactive Query with Apache Hive and Apache Tez
OK, so you’re not going to get huge performance out of a one-node VM, but you can try out Hive on Tez, and see the performance gains versus MapReduce, and also try out features such as Vectorized Query, and the host of new SQL features.…
The pace of innovation within the Apache Hadoop community is truly remarkable, enabling us to announce the availability of Hortonworks Data Platform 2.1, incorporating the very latest innovations from the Hadoop community in an integrated, tested, and completely open enterprise data platform.
There is no doubt that enterprises recognize how Big Data is crucial to monetizing their business. The information contained in the volumes of data collected can offer key insights into product, customer and competitive trends. There are a variety of sophisticated tools for Big Data analytics and processing but most big data implementations are based on rudimentary technologies like FTP based scripts for data collection and distribution.
Although FTP is a widely used protocol, there is an inherent lack of reliability in this approach. …
Apache Falcon is a data governance engine that defines, schedules, and monitors data management policies. Falcon allows Hadoop administrators to centrally define their data pipelines, and then Falcon uses those definitions to auto-generate workflows in Apache Oozie.
InMobi is one of the largest Hadoop users in the world, and their team began the project 2 years ago. At the time, InMobi was processing billions of ad-server events in Hadoop every day.…
We are excited to welcome Blackrock and Passport Capital as Hortonworks investors who today led a $100M round of funding with continued participation from all existing investors.
This latest round of funding will allow us to double-down on our founding strategy: to make open source Apache Hadoop a true enterprise data platform. To that end we are focused in two areas:1. Lead the innovation of Hadoop. In open source, for everyone.…
In February 2014, the Apache Storm community released Storm version 0.9.1. Storm is a distributed, fault-tolerant, and high-performance real-time computation system that provides strong guarantees on the processing of data. Hortonworks is already supporting customers using this important project today.
Many organizations have already used Storm, including our partner Yahoo! This version of Apache Storm (version 0.9.1) is:
- Highly scalable. Like Hadoop, Storm scales linearly
- Fault-tolerant. Automatically reassigns tasks if a node fails
LDAP provides a central source for maintaining users and groups within an enterprise. There are two ways to use LDAP groups within Hadoop. The first is to use OS level configuration to read LDAP groups. The second is to explicitly configure Hadoop to use LDAP-based group mapping.
Here is an overview of steps to configure Hadoop explicitly to use groups stored in LDAP.
- Create Hadoop service accounts in LDAP
- Shutdown HDFS NameNode & YARN ResourceManager
- Modify core-site.xml to point to LDAP for group mapping
- Re-start HDFS NameNode & YARN ResourceManager
- Verify LDAP based group mapping
Prerequisites: Access to LDAP and the connection details are available.…
If there’s one thing my interactions with our customers has taught me, it’s that Apache Hadoop didn’t disrupt the datacenter, the data did. The explosion of new types of data in recent years has put tremendous pressure on the datacenter, both technically and financially, and an architectural shift is underway where Enterprise Hadoop is playing a key role in the resulting modern data architecture.
Due to the flourish of Apache Software Foundation projects that have emerged in recent years in and around the Apache Hadoop project, a common question I get from mainstream enterprises is: What is the definition of Hadoop?
This question goes beyond the Apache Hadoop project itself, since most folks know that it’s an open source technology borne out of the experience of web scale consumer companies such as Yahoo!, Facebook and others who were confronted with the need to store and process massive quantities of data.…