Preprints
https://doi.org/10.5194/egusphere-2025-5842
https://doi.org/10.5194/egusphere-2025-5842
04 Jan 2026
 | 04 Jan 2026
Status: this preprint is open for discussion and under review for Hydrology and Earth System Sciences (HESS).

Real-time Monitoring of Petroleum Hydrocarbons in Groundwater using Hybrid Machine Learning Architectures

Chen Lester Reñon Wu, R. Martijn Wagterveld, Luuk C. Rietveld, and Boris M. van Breukelen

Abstract. Monitoring petroleum hydrocarbon (PHC) plumes in groundwater is essential for managing oil contamination but is often hindered by high costs. We evaluated machine learning (ML) frameworks that estimate concentrations of benzene, ethylbenzene, and xylenes (BEX), using affordable, in situ water quality parameters (iWQPs) as inputs: pH, dissolved oxygen, electrical conductivity, and oxidation-reduction potential. Due to a scarcity of field data, we trained and tested models on high-resolution virtual data generated by a reactive transport model. We compared a long short-term memory (LSTM) network against classical algorithms (multiple linear regression, random forest, support vector regression, XGBoost) and an LSTM-XGBoost hybrid. Model performance depended on the underlying geochemical relationship between iWQPs and BEX. Accurate predictions (R² ≥ 0.80, MAPE < 2.3 %) were achieved when iWQPs were strongly correlated with BEX degradation (e.g., as a primary electron donor); the LSTM model yielded predictions within a 5 % error margin for 70 % of the test cases. Performance declined sharply (R² < 0) during periods where iWQPs were correlated with non-volatile dissolved organic carbon, another component of dissolved PHC. Incorporating hydraulic head data improved accuracy by informing the model of groundwater flow dynamics. While the LSTM model struggled to extrapolate beyond its training data (e.g., during extreme flow events), it reliably detected the direction of concentration trends, providing a valuable trigger for adaptive monitoring. We also demonstrated how a hybrid Kalman filter could successfully capture concentration trends after source removal through recursive updating. Our proposed ML framework provides BEX level estimation for improved groundwater monitoring.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Chen Lester Reñon Wu, R. Martijn Wagterveld, Luuk C. Rietveld, and Boris M. van Breukelen

Status: open (until 01 Mar 2026)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • CC1: 'Comment on egusphere-2025-5842', Giacomo Medici, 13 Jan 2026 reply
Chen Lester Reñon Wu, R. Martijn Wagterveld, Luuk C. Rietveld, and Boris M. van Breukelen

Interactive computing environment

Hybrid Machine Learning Models for Estimating Petroleum Hydrocarbon Concentration in Groundwater Chen Lester R. Wu et al. https://doi.org/10.4121/0a23147e-ba85-4ba2-a058-ba199c65d711

Virtual Experiments with Reactive Transport Modelling using FloPy: Transport and Degradation of Dissolved Petroleum Hydrocarbons in Groundwater Chen Lester R. Wu et al. https://doi.org/10.4121/f7742f02-ee3a-4a84-adf1-625b4a9fd703

Chen Lester Reñon Wu, R. Martijn Wagterveld, Luuk C. Rietveld, and Boris M. van Breukelen

Viewed

Total article views: 252 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
171 70 11 252 21 8 8
  • HTML: 171
  • PDF: 70
  • XML: 11
  • Total: 252
  • Supplement: 21
  • BibTeX: 8
  • EndNote: 8
Views and downloads (calculated since 04 Jan 2026)
Cumulative views and downloads (calculated since 04 Jan 2026)

Viewed (geographical distribution)

Total article views: 215 (including HTML, PDF, and XML) Thereof 215 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 29 Jan 2026
Download
Short summary
We developed a cost-effective way to monitor toxic petroleum contaminants in groundwater using machine learning and water quality measurements such as pH, dissolved oxygen, redox potential, and electrical conductivity. By simulating real-world conditions, we showed that machine learning models can predict contamination concentration due to reactions being triggered during contamination events. The presented framework complements existing monitoring strategies for better groundwater management.
Share