A Basin-Aware Global Framework for Computationally Efficient Surface Water Inundation Prediction

Tashie, Arik M.; Gerg, Isaac D.; Koester, Evan; Hoyos, Carlos D.; Galindo, Eduardo; Farnham, David J.

doi:10.5194/egusphere-2026-1527

Preprints

https://doi.org/10.5194/egusphere-2026-1527

Preprints

24 Apr 2026

| 24 Apr 2026

A Basin-Aware Global Framework for Computationally Efficient Surface Water Inundation Prediction

Arik M. Tashie, Isaac D. Gerg, Evan Koester, Carlos D. Hoyos, Eduardo Galindo, and David J. Farnham

Abstract. Predicting surface water inundation at regional to global scales presents a fundamental tension: bespoke local models achieve high accuracy but require proprietary data and are difficult to scale, while globally trained systems offer broad coverage but demand substantial computational infrastructure and may lack flexibility for regional customization. We present the Basin-Aware Global Inundation Modeling framework (BAGIM), which addresses this gap by combining globally available, freely accessible datasets with basin-scale calibration to capture regional hydrological specificity. We evaluate six model architectures across eight geographically diverse basins to test three hypotheses: (1) that hydrologically meaningful feature engineering is more impactful than architectural complexity, (2) that basin-scale training mitigates regional biases in global datasets, and (3) that basin-aware models can generalize to extreme events beyond the training distribution. Our experiments demonstrate that tree-based ensembles (XGBoost, Random Forest) consistently outperform more complex deep learning architectures, achieving median F1 scores of approximately 0.5 against OPERA DSWx-S1 reference data, performance that approaches the inherent uncertainty ceiling imposed by disagreement among remote sensing products themselves in settings with small, shallow, and intermittent water bodies. We find that features commonly assumed essential for operational flood forecasting (i.e., coincident river-basin streamflow, Height Above Nearest Drainage, and elevation) are neither sufficient nor strictly necessary for reliable prediction, with well-engineered meteorological and terrain features achieving comparable performance without explicit streamflow inputs. This challenges a core assumption underlying many current operational flood forecasting systems. Cross-basin transfer experiments reveal limited transferability, reinforcing the importance of basin-aware calibration. Further, models trained exclusively on non-extreme events produce directionally correct predictions for out-of-sample extremes, though with conservative bias (higher precision, lower recall). We suggest that a design philosophy prioritizing feature engineering and regional calibration over architectural complexity enables accessible deployment without sacrificing predictive skill.

Received: 18 Mar 2026 – Discussion started: 24 Apr 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Arik M. Tashie, Isaac D. Gerg, Evan Koester, Carlos D. Hoyos, Eduardo Galindo, and David J. Farnham

Status: final response (author comments only)

RC1: 'Comment on egusphere-2026-1527', Anonymous Referee #1, 03 Jun 2026

In this work, a basin-aware machine learning framework (BAGIM) is developed that combines globally available open-data products with basin-scale calibration to predict daily binary surface water extent at 30 m resolution. The authors evaluate six ML architectures across eight geographically diverse basins, test three hypotheses (H1: feature engineering over architectural complexity; H2: basin-scale training and regional bias; H3: generalization to extreme events), and report that tree-based ensembles (XGBoost, Random Forest) consistently outperform more complex deep-learning architectures. They also found that these models have limited cross-basin transfer, and that streamflow inputs are not necessary for reliable prediction.
However, there are some major concerns that Authors need to address before acceptance
Recommendation: Major revisions.
Comment-1
The training labels for inundation comes from OPERA DSWx-S1, which detects all open inland water on a given day without distinguishing flood inundation, and this is a major drawback in the workflow. Over a 14-month observation window, most positive-class pixels will be permanent and semi-permanent water bodies whose locations do not change (how the model can understand the seasonality?). Under this interpretation, the H3 experiment mainly indicates whether a model trained on surface water masks can predict flood inundation, and hence the observed pattern (precision increases, recall decreases) is then unsurprising: the model correctly identifies water that was present before, during, and after the flood (high precision in the channel) and misses floodplain inundation that did not exist in the training distribution. In my understanding, this could cause a huge class imbalance in wet to dry ratios. This analysis need to be rechecked and possibly rerun the model with proper justifications.
Comment-2
Authors also claimed that streamflow features are “neither sufficient nor strictly necessary for reliable prediction” and positioned as a challenge to the operational streamflow-driven flood forecasting. However, the ERA5 meteorological forcings and HydroATLAS-derived catchment attributes remain in the feature set of the “No Streamflow” configuration. Removing modeled streamflow therefore does not remove independent information from the model. Streamflow at a given location is a lagged, response to upstream rainfall and if the inundation model already has access to those rainfall, temperature, and catchment-attribute features, the model can learn the rainfall-runoff transformation internally. I would suggest the Authors should redo the experiment by removing the meteorological forcings and report the results or properly justify the claim.
Comment-3
It has been found that BAGIM models do not generalize across basins. To deploy BAGIM in a new basin, a user must need to retrain the model which needs extensive work (get OPERA DSWx-S1 observations, construct the full 30-band static feature stack, extract ERA5 forcings, run the LSTM-FiLM streamflow model, and perform cross validation etc.). By contrast, Google Flood Hub and GloFAS are trained once globally and deployed everywhere with effectively zero local setup. How do the Authors justify this claim?

Citation: https://doi.org/10.5194/egusphere-2026-1527-RC1
RC2:
'Comment on egusphere-2026-1527', Anonymous Referee #2, 20 Jul 2026
I think this paper does a very good job framing global context in hydrological modeling. I think that the authors are overall fair and transparent about their experiments and interpreting the results. The overall quality of the presentation is good. The tables and figures are appropriate and helpful.

I have several significant high level concerns

The introduction narrative needs citations or even anecdotal evidence. I don’t generally disagree with the contents of the introduction but many of the paragraphs are claims without evidence.

I suggest you delete H1 entirely from the paper. It is about feature engineering being more important than model architecture. Famously the 2009 paper “The Unreasonable Effectiveness of Data” spurred on tons of additional papers testing the same thing. There are myriad papers in hydrology and other fields that pit model architectures against each other. Anecdotally, I point you to the evolution of commercial LLM services who keep making their models take ever wider context windows and billions more parameters. Furthermore, I don’t think that your experiments to explore the hypothesis makes a proper and robust effort to samples the wide range of datasets or model architectures on the table. The results spend more time discussing how this hypothesis was proven by changing model architectures rather than increasing the volume of data and improving or expanding the feature engineering. That quickly becomes a combinatorial explosion to test.

Hypothesis 2 and 3 have more scientific value than H1. However I’m not convinced that the experimental design actually constitute evidence for your hypothesis. I elaborate in the following point about the methods.

About 10 pages, all of sections 2.1-2.4, is generally describing choices of datasets and routine machine learning data preprocessing steps. Only in section 2.4.4 does the manuscript begin to describe an experimental design. This balance needs to be inverted. The simpler portions of data preparation and model selection can be relegated to an appendix or trimmed. I think they are necessary and helpful explanations but they do not help another scientist recreate the work being presented or justify the experimental design. Section 3.5 for instance presents ensemble methods for using model results which could form the foundation of exploring H2 and H3 but instead it receives only a figure and 2 paragraphs.
Citation: https://doi.org/10.5194/egusphere-2026-1527-RC2

Arik M. Tashie, Isaac D. Gerg, Evan Koester, Carlos D. Hoyos, Eduardo Galindo, and David J. Farnham

Viewed

Total article views: 401 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
306	76	19	401	19	21

HTML: 306
PDF: 76
XML: 19
Total: 401
BibTeX: 19
EndNote: 21

Views and downloads (calculated since 24 Apr 2026)

Month	HTML	PDF	XML	Total
Apr 2026	147	38	10	195
May 2026	119	25	7	151
Jun 2026	16	5	1	22
Jul 2026	24	8	1	33

Cumulative views and downloads (calculated since 24 Apr 2026)

Month	HTML	PDF	XML	Total
Apr 2026	147	38	10	195
May 2026	119	25	7	151
Jun 2026	16	5	1	22
Jul 2026	24	8	1	33

Viewed (geographical distribution)

Total article views: 395 (including HTML, PDF, and XML) Thereof 395 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 29 Jul 2026

Short summary

Surface water extent reflects interactions of meteorology, terrain, and land cover that vary regionally. We present a basin-aware framework using open global data to predict inundation at 30-m resolution. Evaluating six architectures across eight diverse basins, we find that hydrologically meaningful feature engineering outweighs model complexity, tree-based ensembles match or exceed deep learning without GPU infrastructure, and basin-scale calibration mitigates regional biases in global data.


Total:	0
HTML:	0
PDF:	0
XML:	0