The Hortonworks Blog

Posts categorized by : Hadoop

Hortonworks introduces HDP Operations Ready, HDP Security Ready and HDP Governance Ready certifications to showcase solutions that deeply integrate with enterprise Hadoop.

Customer adoption of Apache Hadoop continues to accelerate the pace at which the community works to meet the requirements of Enterprise Hadoop. Once the place of HDFS and MapReduce only, the introduction of Apache Hadoop YARN a little over a year ago has unleashed many new ways to get value from a Hadoop cluster.…

Hortonworks architects vertically integrate the projects within our Hadoop distribution with YARN and HDFS in order to enable HDP to span workloads from batch, interactive, and real time—across both open source and other data access technologies. In HDP 2.2, we deliver work to vertically integrate Apache Storm, Apache Accumulo and Apache HBase so that all of those long-running services run in Hadoop on YARN via Apache Slider.

The Apache Slider community recently released Apache Slider 0.60.0.…

On November 13th, Hortonworks presented the fourth of 8 Discover HDP 2.2 webinars: Rohit Bakhshi, Jitendra Pandey, and Justin Sears hosted this 4th webinar in the series.

Rohit Bakhshi and Jitendra Pandey introduced HDP and discussed how to use HDFS for reliable, scalable, cost-efficient, and fault tolerant as a distributed data storage platform for your Modern Data Architecture (MDA). They also covered new HDFS data storage innovations now included in HDP 2.2:

  • Heterogeneous storage
  • Encryption
  • Operational security enhancements

Here is the complete recording of the Webinar.…

As we approach the opening bell on Nasdaq and another milestone for open source Apache Hadoop, we at Hortonworks want to thank those who have contributed deeply to this journey. We owe you – our customers – a huge thank you. Your active collaboration with us in the Apache Hadoop community has greatly impacted the trajectory of this platform for data management and has established a path for how thousands of other enterprises can successfully build a new open data architecture that brings all data under management.…

Many types of industries are finding new opportunities from an abundance of new types of data stored at scale in Hadoop, combined with Hadoop’s ability to process that data at lower costs than traditional platforms. Apache Hadoop and the Hortonworks Data Platform (HDP) can help enterprises turn what used to be data fumes into high-octane fuel that propels their businesses.

Sign up for the Hadoop industry solutions email series to find out how Hortonworks customers use Hadoop to solve real-world business challenges.…

The Stinger.next initiative, with its focus on transactions, sub-second queries and SQL:2011 Analytics evolves Apache Hive to allow it to run most of the analytical workloads that are typical within a data warehouse, but now at petabyte scale. The first phase of Stinger.Next, delivered in Apache Hive 0.14 and in HDP 2.2, delivers transactions with ACID semantics a critical step in the evolution of the Hive as the defacto standard for SQL in Hadoop.…

The public sector is charged with protecting citizens, responding to constituents, providing services and maintaining infrastructure. In many instances, the demands of these responsibilities increase while government resources simultaneously shrink under budget pressures.

How can Intelligence, Defense and Civilian agencies do more with less?

Apache Hadoop is part of the answer. Within the public sector, Hadoop delivers data-driven actions in support of IT efficiency and good government.

Download the White Paper

In one example, the United States Internal Revenue Service had to reduce its auditor headcount due to budget cuts.…

With Apache Hadoop YARN as its architectural center, Apache Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it in different ways. As YARN propels Hadoop’s emergence as a business-critical data platform, the enterprise requires more stringent data security capabilities. Apache Ranger provides many of these, with central security policy administration across authorization, accounting and data protection.…

The architecture of Hortonworks Data Platform (HDP) matches the blueprint for Enterprise Apache Hadoop, with data management, data access, governance, operations and security. This post focuses on one of those core components: security. Specifically, we will focus on Apache Knox Gateway for securing access to the Hadoop REST APIs.

Pseudo Federation Provider

This blog will walk through the process of adding a new provider for establishing the identity of a user.…

With Apache Hadoop YARN as its architectural center, Apache Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it for batch, interactive and real-time streaming use cases. More and more independent software vendors (ISVs) are developing applications to run in Hadoop via YARN. This increases the number of users and processing engines that operate simultaneously across a Hadoop cluster, on the same data, at the same time.…

Introduction

In this 2nd part of the blog post and its accompanying IPython Notebook in our series on Data Science and Apache Hadoop, we continue to demonstrate how to build a predictive model with Apache Hadoop, using existing modeling tools. And this time we’ll use Apache Spark and ML-Lib.

Apache Spark is a relatively new entrant to the Hadoop ecosystem. Now running natively on Apache Hadoop YARN, the architectural center of Hadoop, Apache Spark is an in-memory data processing API and execution engine that is effective for machine learning and data science use cases.…

As more organizations consider the cloud as a component of their Apache Hadoop deployments, we can look to our partners for a range of solutions designed to meet these needs. This is the first post in a series on partner solutions available for deploying Hadoop in the cloud. We will build on the Hybrid deployment post with general use cases for Hadoop in a Hybrid cloud. Through our partners we have broad set of options for the cloud available today spanning on-premises, virtual and cloud-based deployments.…

Hadoop Operations for provisioning, managing and monitoring a cluster are critical to the success of a Hadoop project and having an intuitive and effective set of tooling has become a foundational element of a Hadoop distribution. Within HDP, we provide completely open source Apache Ambari to help you be successful with Hadoop operations.

The rate of innovation in the Ambari community is astonishing and this pace continues with the 7th release of the project this year alone, Apache Ambari 1.7.0.…

Data platforms within Enterprises are in midst of a generational shift. After successful reliance on databases for decades, leading organizations today are complementing their data platforms to create a Modern Data Architecture (MDA) with Apache Hadoop in a Data Lake environment. Hadoop with its scale out and schema free architecture enables organizations to store and analyze all its structured and unstructured data in a single consolidated data environment. A key partner in the Hadoop journey has been the complementary infrastructure of server, storage and networking.…

Our customers have many choices of infrastructure to deploy HDP: on premise, cloud, virtualized and even as an appliance. Further, our customers have a choice of deploying on Linux and Windows operating systems. You can easily see this creates a complex matrix. At Hortonworks, we believe you should not be limited to just one option but have the option to choose the best combination of infrastructure and operating system based on the usage scenario.…

Go to page:12345...102030...Last »