Druid is an open-source analytics data store designed for business intelligence (OLAP) queries on event data. Druid provides low latency (real-time) data ingestion, flexible data exploration, and fast data aggregation. Existing Druid deployments have scaled to trillions of events and petabytes of data. Druid is most commonly used to power user-facing analytic applications.
Druid and the Druid logo are copyright Metamarkets Group Inc.
Druid is a registered trademark of Metamarkets Group Inc.
|Sub-Second Queries||Druid delivers sub-second queries, even when you have terabytes of data and dozens of dimensions.|
|Real-Time Data Ingestion||Druid makes real-time a reality. Query data seconds after it arrives. Native integration with Apache Kafka makes it simple to enable real-time analytics.|
|Integrated with Apache Hive||Build OLAP cubes and run sub-second SQL queries using any Hive-compatible tool.|
|Apache Ambari Integration||Apache Ambari makes deploying, configuring and monitoring Druid a breeze..|
Hortonworks focuses on enabling fast, scalable analytics that seamlessly combines historical and real-time data.
A very common request from many customers is to be able to index text in image files; for example, text in scanned PNG files. In this tutorial we are going to walkthrough how to do this with SOLR. Prerequisites Download the Hortonworks Sandbox Complete the Learning the Ropes of the HDP Sandbox tutorial. Step-by-step guide […]
Introduction JReport is a embedded BI reporting tool can easily extract and visualize data from the Hortonworks Data Platform 2.3 using the Apache Hive JDBC driver. You can then create reports, dashboards, and data analysis, which can be embedded into your own applications. In this tutorial we are going to walkthrough the folllowing steps to […]
Introduction R is a popular tool for statistics and data analysis. It has rich visualization capabilities and a large collection of libraries that have been developed and maintained by the R developer community. One drawback to R is that it’s designed to run on in-memory data, which makes it unsuitable for large datasets. Spark is […]
Introduction This tutorial will teach you how to set up a full development environment for developing and debugging Spark applications. For this tutorial we’ll be using Scala, but Spark also supports development with Java, Python, and R. The Java version of this tutorial can be found here, and the Python version here. We’ll be using […]
Pivotal HAWQ provides strong support for low-latency analytic SQL queries, coupled with massively parallel machine learning capabilities on Hortonworks Data Platform (HDP). HAWQ is the World’s leading SQL on Hadoop tool. It provides the richest SQL dialect with an extensive data science library called MADlib at milliseconds query response times. HAWQ enables discovery-based analysis of […]
Spark 1.6 Technical Preview – with HDP 2.3 This technical preview allows you to evaluate Apache Spark 1.6 on YARN with HDP 2.3. With YARN, Hadoop supports various types of workloads. Spark on YARN becomes yet another workload running against the same set of hardware resources. This technical preview describes how to: Run Spark on […]
Spark 1.5.1 Technical Preview – with HDP This technical preview allows you to evaluate Apache Spark 1.5.1 on YARN with HDP 2.3. With YARN, Hadoop supports various types of workloads. Spark on YARN becomes yet another workload running against the same set of hardware resources. This technical preview describes how to: Run Spark on YARN […]
Introduction The Azure cloud infrastructure has become a common place for users to deploy virtual machines on the cloud due to its flexibility, ease of deployment, and cost benefits. Microsoft has expanded Azure to include a marketplace with thousands of certified, open source, and community software applications and developer services, pre-configured for Microsoft Azure. This […]
Apache, Hadoop, Falcon, Atlas, Tez, Sqoop, Flume, Kafka, Pig, Hive, HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper, Oozie, Metron and the Hadoop elephant and Apache project logos are either registered trademarks or trademarks of the Apache Software Foundation in the United States or other countries.