Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
October 04, 2016
prev slideNext slide

Servient and Hortonworks in Energy: Millions of Dollars Saved in Trading Irregularities e-Discovery Project


One the most enjoyable parts of my job is working with customers and partners who have innovated on the Hortonworks Connected Data Platform.  Companies like Servient. Here’s a great real example of a recent use case for a customer we worked together on in the energy vertical.  I’ve removed the actual name for obvious reasons.

Many industries face the problem of disparate, disconnected data, no way to manage or search, incompatible taxonomies, the inability to manage data acquisition and divestiture, lack of timely response to government requests, the inability to scale, and so on.  Energy is no exception.

To provide a little more context, this is an industry that’s highly regulated and in a tough financial spot because of the collapse in oil prices.  Anything they can do to save or defer costs is good.   Many companies are looking for a solution to deal with their massive volumes of unstructured data.  They need help with everything from e-discovery to meeting regulatory and legal requirements to optimizing the customized production processes.

Servient possesses a unique blend of modern architecture, machine-learning, and a flexible user experience based in an easy to use visual setting that can help with any or all of these potential issues.  We already have multiple joint installations in energy helping customers manage their growing silos of unstructured data with these types of business goals in mind.  These can be on-premise, in the cloud or hybrid, making it highly adaptive and cost effective.  We have found that one of the quickest ways to get moving is to help a client solve any potential eDiscovery or regulatory issues.

Servient processes data fast: 250,000+ documents per hour across hundreds of formats.  It is a distributed system  that enables processing at the same time it is being used for searching, machine-learning etc.  In other words there is no down time while ingesting data – the system can still be used during this time. Servient’s architecture is built upon big data technologies supported by Hortonworks’ platform, including HBase, KAFKA and SOLR Cloud.

Case Study

In one recent use case, we worked with a company to build a multi terabyte e-discovery repository from disconnected silos of unstructured data to investigate potential energy trading irregularities.

The initial 9 Terabytes of Data was collected from instant messages, emails, and ShareFile. It took several days to process data and place it into a central repository ready for searching and review.  By using a proprietary KAFKA based processing engine it saved weeks of time.  Then the initial 9 TB —  some 31 million documents — was filtered down to 2TB and 750,000 documents by the clients using  powerful search and filtering techniques.

Compared to similar previous investigations by the client using traditional eDiscovery technology, the Servient solution saved well over $1M and was completed in less than half the time.

Beyond this, the company has also discovered that new cases can be set up in a few minutes and new users can access the system with no added costs. Calls on documents are preserved and can be re-used with no additional human intervention and the machine learning models can be re-used at any time for new issues. Now that it’s set up, the entire process can be quickly replicated for anything from litigation, to government and internal investigations.

In comparison, traditional solutions for archiving, data management and eDiscovery are often overwhelmed by volume, global locales and outdated tools.  Regulatory and eDiscovery are very common issues  that most if not all large companies face. Servient and Hortonworks can quickly get a modern, cost-effective environment created in 30 – 60 days at a significantly reduced cost that fully leverages the capability of the Hadoop ecosystem. The Servient – Hortonworks solution can provide almost immediate savings of time and expense.

If you would like to find out more, please contact me at, or for Servient contact Bill Schieffelin at  You can also read more detail from Servient at

Leave a Reply

Your email address will not be published. Required fields are marked *