A Basin-Aware Global Framework for Computationally Efficient Surface Water Inundation Prediction
Abstract. Predicting surface water inundation at regional to global scales presents a fundamental tension: bespoke local models achieve high accuracy but require proprietary data and are difficult to scale, while globally trained systems offer broad coverage but demand substantial computational infrastructure and may lack flexibility for regional customization. We present the Basin-Aware Global Inundation Modeling framework (BAGIM), which addresses this gap by combining globally available, freely accessible datasets with basin-scale calibration to capture regional hydrological specificity. We evaluate six model architectures across eight geographically diverse basins to test three hypotheses: (1) that hydrologically meaningful feature engineering is more impactful than architectural complexity, (2) that basin-scale training mitigates regional biases in global datasets, and (3) that basin-aware models can generalize to extreme events beyond the training distribution. Our experiments demonstrate that tree-based ensembles (XGBoost, Random Forest) consistently outperform more complex deep learning architectures, achieving median F1 scores of approximately 0.5 against OPERA DSWx-S1 reference data, performance that approaches the inherent uncertainty ceiling imposed by disagreement among remote sensing products themselves in settings with small, shallow, and intermittent water bodies. We find that features commonly assumed essential for operational flood forecasting (i.e., coincident river-basin streamflow, Height Above Nearest Drainage, and elevation) are neither sufficient nor strictly necessary for reliable prediction, with well-engineered meteorological and terrain features achieving comparable performance without explicit streamflow inputs. This challenges a core assumption underlying many current operational flood forecasting systems. Cross-basin transfer experiments reveal limited transferability, reinforcing the importance of basin-aware calibration. Further, models trained exclusively on non-extreme events produce directionally correct predictions for out-of-sample extremes, though with conservative bias (higher precision, lower recall). We suggest that a design philosophy prioritizing feature engineering and regional calibration over architectural complexity enables accessible deployment without sacrificing predictive skill.
In this work, a basin-aware machine learning framework (BAGIM) is developed that combines globally available open-data products with basin-scale calibration to predict daily binary surface water extent at 30 m resolution. The authors evaluate six ML architectures across eight geographically diverse basins, test three hypotheses (H1: feature engineering over architectural complexity; H2: basin-scale training and regional bias; H3: generalization to extreme events), and report that tree-based ensembles (XGBoost, Random Forest) consistently outperform more complex deep-learning architectures. They also found that these models have limited cross-basin transfer, and that streamflow inputs are not necessary for reliable prediction.
However, there are some major concerns that Authors need to address before acceptance
Recommendation: Major revisions.
Comment-1
The training labels for inundation comes from OPERA DSWx-S1, which detects all open inland water on a given day without distinguishing flood inundation, and this is a major drawback in the workflow. Over a 14-month observation window, most positive-class pixels will be permanent and semi-permanent water bodies whose locations do not change (how the model can understand the seasonality?). Under this interpretation, the H3 experiment mainly indicates whether a model trained on surface water masks can predict flood inundation, and hence the observed pattern (precision increases, recall decreases) is then unsurprising: the model correctly identifies water that was present before, during, and after the flood (high precision in the channel) and misses floodplain inundation that did not exist in the training distribution. In my understanding, this could cause a huge class imbalance in wet to dry ratios. This analysis need to be rechecked and possibly rerun the model with proper justifications.
Comment-2
Authors also claimed that streamflow features are “neither sufficient nor strictly necessary for reliable prediction” and positioned as a challenge to the operational streamflow-driven flood forecasting. However, the ERA5 meteorological forcings and HydroATLAS-derived catchment attributes remain in the feature set of the “No Streamflow” configuration. Removing modeled streamflow therefore does not remove independent information from the model. Streamflow at a given location is a lagged, response to upstream rainfall and if the inundation model already has access to those rainfall, temperature, and catchment-attribute features, the model can learn the rainfall-runoff transformation internally. I would suggest the Authors should redo the experiment by removing the meteorological forcings and report the results or properly justify the claim.
Comment-3
It has been found that BAGIM models do not generalize across basins. To deploy BAGIM in a new basin, a user must need to retrain the model which needs extensive work (get OPERA DSWx-S1 observations, construct the full 30-band static feature stack, extract ERA5 forcings, run the LSTM-FiLM streamflow model, and perform cross validation etc.). By contrast, Google Flood Hub and GloFAS are trained once globally and deployed everywhere with effectively zero local setup. How do the Authors justify this claim?