DataTorrent

Hadoop’s Platform for Real-time Stream Analytics

As a native Hadoop solution, DataTorrent enables you to leverage your existing Hadoop environment for real-time stream analytics on a massive scale. Processing billions of events per second with sub-second latency, DataTorrent supports today’s most demanding, mission-critical, big-data applications.

Common use cases include processing machine-data for the Internet of Things, fraud detection in financial services, log analysis and IT operations, Geo-location services, Social apps, and more.

DataTorrent in the Modern Data Architecture

Hortonworks-DataTorrent MDA 2014

With DataTorrent, you can:

1. Use Hadoop for real-time – DataTorrent installs on your existing cluster, and it co-exists with your current Batch jobs and Hadoop tools.

2. Plug-in any data source – structured or unstructured – and support any business logic and any computation complexity.

3. Focus on your code and not on managing the infrastructure. DataTorrent automatically handles all runtime operations – such as scaling, resource optimizations, high availability, state snapshotting, and dynamic application modification.

4. Sustain any future changes to load, distribution or business requirements, as your needs evolve- without having to change your stack or code.

Reference Architecture

DataTorrent ReferenceArchitecture 2014

Key Features

The platform’s unparalleled performance and enterprise-grade features simplify the development and runtime of real-time stream applications.

1. Linear Scalability: DataTorrent automatically scales to accommodate any data size and processing need. Linear scalability with sub-second latency is guaranteed – even while processing 100s of millions of events per second.

2. High Performance: DataTorrent support massive throughput- with all computations done in-memory, with sub second latency. Per container, DataTorrent allows for massive ingestion and computation, resulting in better utilization of your infrastructure.

3. Built-in fault tolerance: Applications self-heal with no data loss, no state loss or human intervention. Highly efficient and distributed automatic state snapshot enables check-pointing with minimal impact on latency. You can even enhance your code and update your app while it is running.

4. Easy Data Integration: Easily integrate any data flow with pre-configured input/output adapters to various message buses and databases. Automatically integrate your real-time applications with your technology stack using Java, config files or CLI.

Develop Faster: Shorten time to market with DataTorrent’s extensive open-source library of pre-configured Operators and application templates. A rich set of tools and an interactive user interface are provided for monitoring, debugging, and charting real time applications.