Get Started


Ready to Get Started?

Download sandbox

How can we help you?

closeClose button


As a native Hadoop solution, DataTorrent enables you to leverage your existing Hadoop environment for real-time stream analytics on a massive scale. Processing billions of events per second with sub-second latency, DataTorrent supports today’s most demanding, mission-critical, big-data applications.

Common use cases include processing machine-data for the Internet of Things, fraud detection in financial services, log analysis and IT operations, Geo-location services, Social apps, and more.

John Fanelli, VP Marketing, DataTorrent

DataTorrent in the Modern Data Architecture

Hortonworks-DataTorrent MDA 2014

With DataTorrent, you can:

1. Use Hadoop for real-time – DataTorrent installs on your existing cluster, and it co-exists with your current Batch jobs and Hadoop tools.

2. Plug-in any data source – structured or unstructured – and support any business logic and any computation complexity.

3. Focus on your code and not on managing the infrastructure. DataTorrent automatically handles all runtime operations – such as scaling, resource optimizations, high availability, state snapshotting, and dynamic application modification.

4. Sustain any future changes to load, distribution or business requirements, as your needs evolve- without having to change your stack or code.

Reference Architecture

DataTorrent ReferenceArchitecture 2014

Key Features

The platform’s unparalleled performance and enterprise-grade features simplify the development and runtime of real-time stream applications.

1. Linear Scalability: DataTorrent automatically scales to accommodate any data size and processing need. Linear scalability with sub-second latency is guaranteed – even while processing 100s of millions of events per second.

2. High Performance: DataTorrent support massive throughput- with all computations done in-memory, with sub second latency. Per container, DataTorrent allows for massive ingestion and computation, resulting in better utilization of your infrastructure.

3. Built-in fault tolerance: Applications self-heal with no data loss, no state loss or human intervention. Highly efficient and distributed automatic state snapshot enables check-pointing with minimal impact on latency. You can even enhance your code and update your app while it is running.

4. Easy Data Integration: Easily integrate any data flow with pre-configured input/output adapters to various message buses and databases. Automatically integrate your real-time applications with your technology stack using Java, config files or CLI.

Develop Faster: Shorten time to market with DataTorrent’s extensive open-source library of pre-configured Operators and application templates. A rich set of tools and an interactive user interface are provided for monitoring, debugging, and charting real time applications.

HDP Certifications

HDP - HDP Certified badge indicates this partner’s solution has been certified to work with HDP; reviewed for architectural best practices and validated against a comprehensive suite of integration test cases, benchmarked for scale under varied workloads and comprehensively documented.

Yarn Ready - Apache Hadoop YARN is the data operating system for Hadoop 2. YARN Ready certification recognizes applications that integrate with YARN and process data via pushdown computation to the cluster. Examples of a YARN ready solution includes an application that has native YARN application master or leverages scale-out capabilities of the platform like Hive, Spark and MR2.