HDP is the industry's only true secure, enterprise-ready open source Apache™ Hadoop® distribution based on a centralized architecture (YARN). HDP addresses the complete needs of data-at-rest, powers real-time customer applications and delivers robust analytics that accelerate decision making and innovation.
YARN and Hadoop Distributed File System (HDFS) are the cornerstone components of Hortonworks Data Platform (HDP). While HDFS provides the scalable, fault-tolerant, cost-efficient storage for your big data lake, YARN provides the centralized architecture that enables you to process multiple workloads simultaneously. YARN provides the resource management and pluggable architecture for enabling a wide variety of data access methods.
Data streaming, processing and analytics engines for a variety of workloads
Hortonworks Data Platform includes a versatile range of processing engines that empower you to interact with the same data in multiple ways, at the same time. This means applications can interact with the data in the best way: from batch to interactive SQL or low latency access with NoSQL. Emerging use cases for data science, search and streaming are also supported with Apache Spark, Storm and Kafka.
HDP extends data access and management with powerful tools for data governance and integration. They provide a reliable, repeatable, and simple framework for managing the flow of data in and out of Hadoop. This control structure, along with a set of tooling to ease and automate the application of schema or metadata on sources is critical for successful integration of Hadoop into your modern data architecture.
Hortonworks has engineering relationships with many leading data management providers to enable their tools to work and integrate with HDP.
Authentication, authorization, and data protection
Security is woven and integrated into HDP in multiple layers. Critical features for authentication, authorization, accountability and data protection are in place to help secure HDP across these key requirements. Consistent with this approach throughout all of the enterprise Hadoop capabilities, HDP also ensures you can integrate and extend your current security solutions to provide a single, consistent, secure umbrella over your modern data architecture.
Operations teams deploy, monitor and manage a Hadoop cluster within their broader enterprise data ecosystem. Apache Ambari simplifies this experience. Ambari is an open source management platform for provisioning, managing, monitoring, and securing the Hortonworks Data Platform. It enables Hadoop to fit seamlessly into your enterprise environment.
Provision and manage Hadoop clusters in any cloud environment
Cloudbreak, as part of Hortonworks Data Platform and powered by Apache Ambari, allows you to simplify the provisioning of clusters in any cloud environment including; Amazon Web Services, Microsoft Azure, Google Cloud Platform and OpenStack. It optimizes your use of cloud resources as workloads change.
Classification-based Policy. Assign access to data assets based on reusable metadata tags such as PCI or PII.
Location-based Policy. Customize entitlements based on geography. A user trying to access the same data from different locations would be subject to unique geographical context.
Data Expiry-based Policy Assign expiration dates to data tag to automatically deny users access to the tagged data after the expiration date.
Prohibition-based Policy. Define security policy that restricts combining two data sets to help avoid privacy violations.
Row Level Security & Dynamic Data Masking. Restrict row access and anonymize sensitive data in real-time in Hive based on user characteristics and runtime context.
For Hadoop operators
Role-Based Access Control. Apache Ambari 2.4 includes additional cluster operational roles to provide more granular division of control for cluster operations.
Log Search (Technical preview). Automatically configures the collection of cluster operational metrics to aid with analysis and troubleshooting by including a new Log Search service.
Customizable Cluster Alerts. Tailor HDP to fit with your enterprise monitoring environment by configuring a set of predefined alerts that seamlessly integrates with your existing enterprise monitoring tools.
Activity Reporting and Visualization. Activity Reporting and Visualization in Hortonworks SmartSense 1.3 (available separately) helps Hadoop operators understand how their cluster operates.
Try out the latest HDP features and functionality with Hortonworks Sandbox, or set HDP up for a production environment, install and configure your clusters.
Progressive Insurance is one of the largest U.S. auto insurance companies. The team turned to Hortonworks Data Platform to transform its business with massive ingest of new types of data. Progressive uses HDP for ad placement and to store driving data for its usage-based insurance products.
Download the Product Guide In this guide, discover: The technology components of HDP and its blueprint for Enterprise Hadoop. How HDP integrates and complements your existing data systems. The versatility of applications enabled by Apache Hadoop YARN at the core of HDP. How HDP provides security, operations and governance capabilities. Deployment for HDP from Linux…
How to Optimize your Data Architecture with Hadoop
You may be up all night wondering how enterprise organizations deal with large data volumes and data varieties without significantly increasing costs. And perhaps your existing data architectures are not equipped to handle today's data challenges? Join this webinar to learn how to optimize your data architecture and gain significant cost savings with Hadoop. We…
Why a Connected Data Strategy is critical to the future of your data The advent of big data revolutionized analytics and data science and created the concept of new data platforms, allowing enterprises to store, access and analyze vast amounts of historical data. The world of big data was born. But existing data platforms need…
The Financial regulators are driving a Data Evolution Traditionally technology moves fast, regulators react slow. When technology leaps forward, it enables financial firms to change the nature of their business - often into un-regulated territory; Regulators react to pass regulation to catch up. This model can work in slow moving markets, but in todays interconnected…
Try the Latest Innovations in Apache Spark and Apache Zeppelin with Hortonworks 2.5 Sandbox
With the release of Hortonworks 2.5 Sandbox several new exciting features have been added to Apache Spark and Apache Zeppelin. Apache Spark Updates One of the most powerful new Hortonworks 2.5 Sandbox features is the ability to run two versions of Spark alongside in the same environment: a Generally Available (GA) Spark 1.6.2 and a…
Try the latest innovations in the Apache Hadoop ecosystem with Hortonworks 2.5 Sandbox
It’s never been easier to get started with Apache Hadoop. The Hortonworks Sandbox combines 100% open-source Apache Hadoop and its data access engines (Apache Spark, Apache Hive, Apache HBase, Apache Solr, Apache Pig) with enterprise-grade Operations (Apache Ambari), Security (Apache Ranger and Apache Knox) and Governance (Apache Atlas). The Sandbox also provides tools for devOps,…
Building a successful enterprise grade IoT platform
Five key capabilities for IoT implementations IoT connected devices are turning up everywhere. Every major communications carrier is offering its own IoT platform. And hundreds of technology companies are offering capabilities for IoT use case implementation. But the crux of the matter is not what options there are, but how to make it all work together.…
How PepsiCo’s Big Data Strategy is Disrupting CPG Retail Analytics
Like all consumer packaged goods (CPG) companies, PepsiCo relies on huge volumes of data to accurately replenish its retailers with the appropriate amount and type of product. Across the CPG industry, most analysts exclusively rely on Excel and Access for data wrangling, but as PepsiCo’s data surpassed the capabilities of those tools, they knew they…
Apache, Hadoop, Falcon, Atlas, Tez, Sqoop, Flume, Kafka, Pig, Hive, HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper, Oozie, Metron and the Hadoop elephant and Apache project logos are either registered trademarks or trademarks of the Apache Software Foundation in the United States or other countries.