Hadoop 2 and its YARN-based architecture has increased the interest in new engines to be run on Hadoop and one such workload is in-memory computing for machine learning and data science use cases. Apache Spark has emerged as an attractive option for this type of processing and today, we announce availability of our HDP 2.1 Tech Preview Component of Apache Spark. This is a key addition to the platform and brings another workload supported by YARN on HDP.…
The Hortonworks Blog
This is the second in our series on the motivations and architecture for improvements to the Apache Hadoop YARN’s Resource Manager Restart resiliency. Other in the series are:
Introduction: Phase I – Preserve Application-queues
In the introductory blog, we previewed what RM Restart Phase I entails. In essence, we preserve the application-queue state into a persistent store and reread it upon RM restart, eliminating the need for users to resubmit their applications.…
Hortonworks Data Platform 2.1 for Windows is the 100% open source data management platform based on Apache Hadoop and available for the Microsoft Windows Server platform. I have built a helper tool that automates the process of deploying a multi-node Hadoop cluster – utilizing the MSI available in HDP 2.1 for Windows.
HDP on Windows MSI Overview
HDP on Windows installation package comes in the format of MSI, Microsoft’s MSI format utilizes the installation and configuration service provided with Windows called Windows Installer.…
The Apache Knox Gateway team is pleased to announce Knox’s first release as an Apache top-level project: Apache Knox Gateway 0.4.0. The team resolved approximately 100 JIRAs for this release and Knox Gateway is now better positioned to provide complete security for REST API access to a Hadoop cluster.
The new features in Knox Gateway 0.4.0 are the features that enterprise security officers expect in a gateway solution:
- Perimeter security for a Hadoop cluster
- Support for enterprise group lookup
- Audit log of all gateway activity
- Command line tooling for CMF provisioning
- Protection for web application vulnerabilities
- Pre-authentication via SSO token
- And many more…
As a top-level project, Apache Knox Gateway is fully endorsed by the Apache Software Foundation, and this improves coordination between development of Knox and the other core Hadoop projects with which it interacts.…
Three weeks ago, we announced availability of the technical preview of Hortonworks Data Platform (HDP) version 2.1 and since then we have had thousands of downloads of this preview. We also promised delivery of GA bits on April 22nd and we are delighted to deliver as stated. HDP 2.1, which includes countless new features across seven new components, is available today from our download page.
YARN unlocks the Data Lake
As enterprises build new applications with the data they cost effectively capture and process with Apache Hadoop it is important for the platform to facilitate the app dev processes. That’s why we are excited to announce that we’ve expanded our partnership with Concurrent, Inc. to simplify and accelerate application development on Hadoop.
There are two components to this expanded partnership.
Securing any system requires you to implement layers of protection. Access Control Lists (ACLs) are typically applied to data to restrict access to data to approved entities. Application of ACLs at every layer of access for data is critical to secure a system. The layers for hadoop are depicted in this diagram and in this post we will cover the lowest level of access… ACLs for HDFS.
This is part of the HDFS Developer Trail series. …
Today we are proud to announce that the formation of a terrific partnership with LucidWorks to bring search to the Hortonworks Data Platform. LucidWorks delivers an enterprise-grade search development platform built atop the power of Apache Solr.
Shared Vision and New Scenarios
Both LucidWorks and Hortonworks have a shared vision of innovating in open source and delivering it to customers in an enterprise grade platform.
As part of our continuing mission to build the a completely open, versatile enterprise data platform across many data processing scenarios then Solr offers a simple, yet powerful interface providing advanced search capabilities.…
If you’re excited to get started with the new features in Hortonworks Data Platform 2.1, then we’ve included 4 tutorials for you try out – Sandbox-style.
You can download the HDP 2.1 Technical Preview here, and then get stuck into these great tutorials.
Interactive Query with Apache Hive and Apache Tez
OK, so you’re not going to get huge performance out of a one-node VM, but you can try out Hive on Tez, and see the performance gains versus MapReduce, and also try out features such as Vectorized Query, and the host of new SQL features.…
The pace of innovation within the Apache Hadoop community is truly remarkable, enabling us to announce the availability of Hortonworks Data Platform 2.1, incorporating the very latest innovations from the Hadoop community in an integrated, tested, and completely open enterprise data platform.
What’s In Hortonworks Data Platform 2.1?
The advancements in HDP 2.1 span every aspect of Enterprise Hadoop: from data management, data access, integration & governance, security and operations. …
There is no doubt that enterprises recognize how Big Data is crucial to monetizing their business. The information contained in the volumes of data collected can offer key insights into product, customer and competitive trends. There are a variety of sophisticated tools for Big Data analytics and processing but most big data implementations are based on rudimentary technologies like FTP based scripts for data collection and distribution.
Although FTP is a widely used protocol, there is an inherent lack of reliability in this approach. …
Apache Falcon is a data governance engine that defines, schedules, and monitors data management policies. Falcon allows Hadoop administrators to centrally define their data pipelines, and then Falcon uses those definitions to auto-generate workflows in Apache Oozie.
InMobi is one of the largest Hadoop users in the world, and their team began the project 2 years ago. At the time, InMobi was processing billions of ad-server events in Hadoop every day.…
We are excited to welcome Blackrock and Passport Capital as Hortonworks investors who today led a $100M round of funding with continued participation from all existing investors.
This latest round of funding will allow us to double-down on our founding strategy: to make open source Apache Hadoop a true enterprise data platform. To that end we are focused in two areas:…
1. Lead the innovation of Hadoop. In open source, for everyone.
LDAP provides a central source for maintaining users and groups within an enterprise. There are two ways to use LDAP groups within Hadoop. The first is to use OS level configuration to read LDAP groups. The second is to explicitly configure Hadoop to use LDAP-based group mapping.
Here is an overview of steps to configure Hadoop explicitly to use groups stored in LDAP.
- Create Hadoop service accounts in LDAP
- Shutdown HDFS NameNode & YARN ResourceManager
- Modify core-site.xml to point to LDAP for group mapping
- Re-start HDFS NameNode & YARN ResourceManager
- Verify LDAP based group mapping
Prerequisites: Access to LDAP and the connection details are available.…
Luminar is one of Hortonworks’ original customers. Apache Hadoop is a pillar of their modern data architecture, and since choosing Hortonworks in 2012, the Luminar team became expert users of Hortonworks Data Platform version 1.
They were eager to migrate to HDP2 after it launched in October 2013.
I recently spoke with Juan Manuel Alonso, Luminar’s Manager of Insights. Juan Manuel worked with the Hortonworks professional services team to plan and execute the migration from HDP1 to HDP2.…