Newbold Advisors is a Hortonworks® services partner that works with customers in the oil and gas industry. The company delivers big data analytics strategies and solutions across all segments of the oil and gas industry. I recently spoke with Ram Seetepalli, Senior Director at Newbold Advisors. We discussed the challenges facing midstream companies and how organizations in that sector aggressively leverage all of their data assets with a modern data architecture with Apache Hadoop at its core. This post summarizes the topics that we reviewed related to the midstream oil and gas market.
Although commodity prices are significantly lower in 2015, oil and gas production across North America continues to increase. This puts pressure on midstream companies to continue building out new infrastructure (such as pipelines) and to modify existing infrastructure to move product from the well site to a refinery, processor, or storage facility.
As midstream companies deploy new data-enabled infrastructure, the most competitive oil and gas Data-First enterprises will consume the full spectrum of available data using advanced analytic applications. These applications provide a single view of assets and processes and deliver predictive analytics using machine learning algorithms. Hadoop is at the center of the Data-First midstream enterprise, providing significant advantages in processing power, scale, and efficiency—all integrated with existing operational systems of record.
Leading midstream companies recognize that current technologies are not adequate to meet the following important challenges:
Advanced analytics applications can assist a midstream company become more efficient and grow margins in some of the following ways:
Given the high number of variables that are involved in the ongoing analysis of pipeline portfolios, it can be challenging to maximize transportation fuel efficiency. Variables such as ambient and ground temperatures, pipeline pressure, and the volume of gas flow are interrelated. Although Hadoop is best known for its efficient data storage capabilities, the platform’s processing efficiency stands out with these use case.
The majority of oil pipeline rates are based on the oil pipeline rate index approved by the FERC, which is adjusted regularly. As the index changes, companies must be able to optimize fuel consumption per unit of transported product and properly correlate all key data points. With Hadoop they have more confidence in this optimization, because they can base their analysis on a combination of real-time SCADA data and many years of historical data from prior pipeline operations.
Pipeline pressures regularly fluctuate. Most pressure drops are related to normal activities such as power plants drawing fuel or chemical plants ramping up production. Other pressure drops could be the result of anomalous conditions, such as ruptures or leaks. Regulators press pipeline managers to control risk posed by those fluctuations by building automated systems that can both accurately detect and respond to anomalies.
SCADA systems are capable of monitoring these pressure drops, but they are not well suited to interpreting these occurrences and categorizing them as either standard or anomalous fluctuations.
Hadoop, on the other hand, is ideally suited for stream analysis of the SCADA data for real-time alerts based on rules-based algorithms. If pressure drops and the corresponding causes cannot be inferred based on available data, the system would then send appropriate notifications.
Producers allocate specific gas volumes for transportation to markets one day prior to delivery. They base those allocations on the current production from gas wells. These “day ahead” predictions are subject to great uncertainty because of corresponding uncertainty in the gas production process. It is difficult to know in advance from which location the gas will enter, at what volumes, and at what pressure.
This can leave midstream companies scrambling to configure the pipeline to manage volumes different from what they expected. This often results in a sub-optimal configuration and reduced revenues from the asset.
While uncertain, producer volumes may be subject to repeatable patterns and if so, a proper forecasting algorithm could provide useful insight and guidance as to expected volumes from producers. Moreover, models with big data inputs can reveal ways in which midstream companies might use pricing incentives that induce producers to smooth volumes for their own benefit.
Similar to supply, the demand side contains some inherent uncertainties, particularly with the rise of gas-fired power plants attached to interstate pipelines. The electricity market may provide signals that can be correlated to activity at these power plants and linked to future demand for gas. Big data models give producers longer-term sense-and-respond capabilities. These insights allow midstream companies to more effectively balance and further optimize the pipeline.
Midstream companies produce and distribute comprehensive reports on pipeline condition and activities. Those reports typically contain adds, withdrawals, and storage system activity–an operational snapshot of the business from the previous day.
While the reports contain extremely useful information, they provide a static view of a dynamic operation and so do not efficiently serve the needs of some stakeholders. The facts and the insights typically delivered in these reports can be more efficiently converted to real-time or near real-time interactive visualizations that can playback significant events and serve actionable views to the various decision makers.
Midstream companies attempt to optimize revenue and operating costs across multiple pipeline systems. This requires analysis for the transportation, operations, finance and other teams in those enterprises.
Capturing data from all of the pipelines and combining that into a single multi-pipeline dataset provides the information necessary for exploratory data discovery—to build statistical and analytical models across pipelines that drive better efficiency.
Forward-thinking, innovative midstream organizations can take advantage of the unprecedented volume of new types of data. Emerging types of data, such as machine and sensor data, geolocation data, weather data, and log data become valuable at high volumes, especially when correlated against other data sets as part of a shared enterprise data lake within Hortonworks Data Platform (HDP).
The patterns within this data fuel machine learning applications designed to better understand and analyze many critical aspects of a midstream company’ operations, and Hortonworks understands how to integrate this predictive capability into a modern data architecture at those companies.