The Hortonworks Blog

Posts categorized by : HDP

Hadoop 2 and its YARN-based architecture has increased the interest in new engines to be run on Hadoop and one such workload is in-memory computing for machine learning and data science use cases. Apache Spark has emerged as an attractive option for this type of processing and today, we announce availability of our HDP 2.1 Tech Preview Component of Apache Spark.  This is a key addition to the platform and brings another workload supported by YARN on HDP.…

The first use of the term BoF session was used at the Digital Equipment Users’ Society (DECUS) conference in the 1960s. Its essence was to bring together like minds and thought leaders—just as birds of the feather flock together— to share and exchange computing ideas, in an informal yet spirited way. Since then, the organizers and sponsors of most computing conferences have been loyal to its essence and spirit.

For ideas and innovation happen in collaboration—not in isolation. …

This is the second in our series on the motivations and architecture for improvements to the Apache Hadoop YARN’s Resource Manager Restart resiliency. Other in the series are:

Introduction: Phase I – Preserve Application-queues

In the introductory blog, we previewed what RM Restart Phase I entails. In essence, we preserve the application-queue state into a persistent store and reread it upon RM restart, eliminating the need for users to resubmit their applications.…

Hortonworks Data Platform 2.1 for Windows is the 100% open source data management platform based on Apache Hadoop and available for the Microsoft Windows Server platform. I have built a helper tool that automates the process of deploying a multi-node Hadoop cluster – utilizing the MSI available in HDP 2.1 for Windows.

Download HDP 2.1 for Windows

HDP on Windows MSI Overview

HDP on Windows installation package comes in the format of MSI, Microsoft’s MSI format utilizes the installation and configuration service provided with Windows called Windows Installer.…

The Apache Knox Gateway team is pleased to announce Knox’s first release as an Apache top-level project: Apache Knox Gateway 0.4.0. The team resolved approximately 100 JIRAs for this release and Knox Gateway is now better positioned to provide complete security for REST API access to a Hadoop cluster.

The new features in Knox Gateway 0.4.0 are the features that enterprise security officers expect in a gateway solution:

  • Perimeter security for a Hadoop cluster
  • Support for enterprise group lookup
  • Audit log of all gateway activity
  • Command line tooling for CMF provisioning
  • Protection for web application vulnerabilities
  • Pre-authentication via SSO token
  • And many more…

As a top-level project, Apache Knox Gateway is fully endorsed by the Apache Software Foundation, and this improves coordination between development of Knox and the other core Hadoop projects with which it interacts.…

Three weeks ago, we announced availability of the technical preview of Hortonworks Data Platform (HDP) version 2.1 and since then we have had thousands of downloads of this preview.  We also promised delivery of GA bits on April 22nd  and we are delighted to deliver as stated. HDP 2.1, which includes countless new features across seven new components, is available today from our download page

YARN unlocks the Data Lake

YARN, the resource management layer of Hadoop 2 is delivering value as it has unlocked the data lake vision for many.…

The Apache Hive community has voted on and released version 0.13 today. This is a significant release that represents a major effort from over 70 members who worked diligently to close out over 1080 JIRA tickets.

Hive 0.13 also delivers the third and final phase of the Stinger Initiative, a broad community based initiative to drive the future of Apache Hive, delivering 100x performance improvements at petabyte scale with familiar SQL semantics.…

The power of a well-crafted speech is indisputable, for words matter—they inspire to act. And so is the power of a well-designed Software Development Kit (SDK), for high-level abstractions and logical constructs in a programming language matter—they simplify to write code.

In 2007, when Chris Wensel, the author of Cascading Java API, was evaluating Hadoop, he had a couple of prescient insights. First, he observed that finding Java developers to write Enterprise Big Data applications in MapReduce will be difficult and convincing developers to write directly to the MapReduce API was a potential blocker.…

As enterprises build new applications with the data they cost effectively capture and process with Apache Hadoop it is important for the platform to facilitate the app dev processes. That’s why we are excited to announce that we’ve expanded our partnership with Concurrent, Inc. to simplify and accelerate application development on Hadoop.

There are two components to this expanded partnership.

Securing any system requires you to implement layers of protection.  Access Control Lists (ACLs) are typically applied to data to restrict access to data to approved entities. Application of ACLs at every layer of access for data is critical to secure a system. The layers for hadoop are depicted in this diagram and in this post we will cover the lowest level of access… ACLs for HDFS.

This is part of the HDFS Developer Trail series.  …

Yesterday our partner Teradata announced a new capability called Teradata QueryGrid that further deepens the integration between the Teradata Data Warehouse and the Hortonworks Data Platform. This announcement is important because it delivers on the promise and the value of the Modern Data Architecture by demonstrating how the two technologies complement each other for the enterprise.

Teradata pioneered deeper integration with Apache Hadoop through integration with H-Catalog initially with Aster SQL-H and then the Data Warehouse and now they have taken it to the next level with Teradata QueryGrid.…

If you’re excited to get started with the new features in Hortonworks Data Platform 2.1, then we’ve included 4 tutorials for you try out – Sandbox-style.

You can download the HDP 2.1 Technical Preview here, and then get stuck into these great tutorials.

Interactive Query with Apache Hive and Apache Tez

OK, so you’re not going to get huge performance out of a one-node VM, but you can try out Hive on Tez, and see the performance gains versus MapReduce, and also try out features such as Vectorized Query, and the host of new SQL features.…

The pace of innovation within the Apache Hadoop community is truly remarkable, enabling us to announce the availability of Hortonworks Data Platform 2.1, incorporating the very latest innovations from the Hadoop community in an integrated, tested, and completely open enterprise data platform.

Download HDP 2.1 Technical Preview Now

What’s In Hortonworks Data Platform 2.1? Presentation & Applications Enable both existing and new applications to provide value to the organization. Enterprise Management & Security Empower existing operations and security tools to manage Hadoop.…

There is no doubt that enterprises recognize how Big Data is crucial to monetizing their business.  The information contained in the volumes of data collected can offer key insights into product, customer and competitive trends.  There are a variety of sophisticated tools for Big Data analytics and processing but most big data implementations are based on rudimentary technologies like FTP based scripts for data collection and distribution.

Although FTP is a widely used protocol, there is an inherent lack of reliability in this approach.  …

Apache Falcon is a data governance engine that defines, schedules, and monitors data management policies. Falcon allows Hadoop administrators to centrally define their data pipelines, and then Falcon uses those definitions to auto-generate workflows in Apache Oozie.

InMobi is one of the largest Hadoop users in the world, and their team began the project 2 years ago. At the time, InMobi was processing billions of ad-server events in Hadoop every day.…

Go to page:« First...23456...10...Last »