Get Started


Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
February 08, 2016
prev slideNext slide

Three Open Source Software Projects Transforming Oil & Gas Companies


We are already more than a month into 2016 and it’s anything but business as usual in Oil and Gas. Current markets are making companies rethink every aspect of their business model, foundational cost structure, and strategy for delivering value to customers and shareholders.

The same thorough scrutiny should be applied to traditional enterprise software tools and platforms. In fact, open source innovations in enterprise software promise dramatic cost optimization opportunities and also changes to the ways that traditional O&G domain challenges are approached.

Why Go Open Source? Margins, Cash Flow, and Shareholder Value

Oil and Gas IT organizations are familiar with the hardships imposed by proprietary software delivery models: licensing fees, long term vendor lock-in, contractual “gotcha” clauses with no flexibility, and a “required” high cost hardware infrastructure to run it all. Add to those traditional softwares’ lengthy development cycles and a myriad of data integration and data flow challenges, and it’s no wonder that innovation to accelerate business value comes more slowly than what you need.

In today’s cost constrained environment companies are taking matters into their own hands—abandoning outdated approaches, data platforms, and associated tools to significantly lower both their CAPEX and OPEX, while freeing a more innovative approach to delivering business value.

Open Source Innovation for Enterprise Data Flows: Apache™ NiFi Simplifies Your Data Ingestion Pipeline

Imagine secure, prioritized drilling data moving through a data pipeline. Imagine true real-time decisions that help you maximize your CAPEX leverage.

Apache NiFi is a real-time, visual multi-directional data flow solution that was first developed in 2006 at the National Security Agency. NiFi provides data pipeline ingest that securely connects to any distributed data source (in the oilfield or in the data center). NiFi includes comprehensive data acquisition, simple event processing, and prioritized data flow combined with fine-grained, always-on tracking of all data transformations and an intuitive visual user interface to constantly update how data is captured, curated and conducted to a central repository.

NiFi addresses all data formats, protocols, and schemas—handling data flowing in any size, shape or speed while providing data security, comprehensive traceability, prioritization of resources, and tools to transform it while in flight. From the very beginning, NiFi was architected for resilience in remote environments that have unreliable or intermittent availability in communication networks.

Apache NiFi Delivers Speedy Time-to-Value

Oil and gas companies already working with NiFi are benefitting from a modern data flow solution that is 100% open source (without licensing fees) and that delivers a simple, flexible tool that finally delivers on the promise of quick “time-to-value.”

In a few hours, they are building real-time data flows and enterprise data integration pipelines (including streaming data) instead of waiting the standard weeks or months that traditional tools require to hand-code long chains of messaging and scripting solutions.

Apache NiFi’s API is very well documented and all user documentation is available both online and within the tool for easy accessibility. You won’t have to dig around in the code to find out how various components work.

Open Source Innovation for Machine Learning: Results in Minutes with Apache Spark and Apache Zeppelin

Across the industry, companies are deploying Apache Spark for fast, efficient analysis of Oil and Gas datasets for second-by-second visibility into reservoir characterization, drilling operations, production optimization, as well as acceleration of downstream research and development and analysis of Point-of-Sale data for downstream organizations.

Spark provides a fast, in-memory data processing engine for data streaming, seamless SQL integration, and machine learning capabilities that can dramatically shorten time to value in helping solve complex data analysis challenges for Oil and Gas research scientists, data scientists, and engineers.

A key focus area for Hortonworks in Oil and Gas will be to contribute additional Spark algorithms and models that increase the pace of adoption across all segments.

Visualize Your Spark Results in Apache Zeppelin


Apache Zeppelin is an interactive, browser-based notebook environment often used in conjunction with Spark. Together, they provide Oil and Gas data science and engineering research teams the ability to accelerate their data analysis workflows across all dimensions including data discovery and visualization.

Zeppelin provides an abstraction layer for the Spark data processing engine and underlying Apache Hadoop® cluster infrastructure and supports a wide range of options for complex data workflows. Teams can dramatically improve productivity when they collaborate over interactive, “point-in-time” analysis of challenging datasets in short time windows. The Zeppelin notebook supports a wide range of options including SparkSQL, Apache Hive, Python, and Scala amongst others. 

It Is No Longer an Option to “Wait and See How the Market Plays Out”

I am a big proponent of evaluating options, choosing a path, making a decision, and beginning execution immediately. Who enjoys or benefits from lengthy, drawn out discussions that paralyze final decision-making?

Strongly consider trying out Apache NiFi, Apache Spark and Apache Zeppelin today. Oil and Gas companies that I work with are already lowering CAPEX and OPEX costs and doing more for their business in a much shorter period of time.

Those early movers can always course correct and make adjustments, but only because they already got started. With the current state of the market, waiting any longer is not an option.

Learn More


Leave a Reply

Your email address will not be published. Required fields are marked *