Elasticsearch

The most advanced open source search and analytics engine

Elasticsearch provides a real-time, distributed, open source search and analytics platform for structured and unstructured data.

The partnership between Hortonworks and Elasticsearch enables Hortonworks customers and prospects to add Elasticsearch real-time search and analytics on top of  Hortonworks Data Platform (HDP).  This allows HDP customers to complement their current investment in flexible batch processing with software that enables new use cases that are possible only with real-time interaction with the user.

Enable Real-Time Search and Analytics

Elasticsearch is a great fit for “Big Data” because its scalable, distributed nature allows it to search – and store – vast amounts of information in near real-time. Through the Elasticsearch-Hadoop integration, Elasticsearch enables HDP users (including native MapReduce, Hive, Pig and Cascading) to enhance their workflow with a search and analytics engine. Elasticsearch provides a rich language to ask better questions in order to get clearer answers, significantly faster.

elasticsearch MDA b

Complete Integration with the Hadoop Ecosystem

Developers can write MapReduce jobs that index existing data in HDFS, enabling search through the Elasticsearch REST API and related ecosystem.  Developers can also enable MapReduce jobs to read and write the input and output datasets to and from Elasticsearch. This deep integration extends to Hive, Pig and Cascading.

The Elasticsearch-Hadoop project provides a dedicated InputFormat and OutputFormat for vanilla MapReduce, Taps for reading and writing data in Cascading, and Storages for Pig and Hive so you can access Elasticsearch just as if the data were in HDFS.

The integration enables cluster co-locations by exposing shard information to Hadoop. Job tasks are run on the same machines as the Elasticsearch shards themselves, eliminating network traffic and improving performance through data locality.

For more information

ElasticSearch & Hadoop
elasticsearch-hadoop enables real-time searching against data stored in Apache Hadoop. It provides native integration with Map/Reduce, Hive, Pig, and Cascading, all with no customization.

Resources

From the Blog

Webinars

Thank you for subscribing!