A Basin-Aware Global Framework for Computationally Efficient Surface Water Inundation Prediction
Abstract. Predicting surface water inundation at regional to global scales presents a fundamental tension: bespoke local models achieve high accuracy but require proprietary data and are difficult to scale, while globally trained systems offer broad coverage but demand substantial computational infrastructure and may lack flexibility for regional customization. We present the Basin-Aware Global Inundation Modeling framework (BAGIM), which addresses this gap by combining globally available, freely accessible datasets with basin-scale calibration to capture regional hydrological specificity. We evaluate six model architectures across eight geographically diverse basins to test three hypotheses: (1) that hydrologically meaningful feature engineering is more impactful than architectural complexity, (2) that basin-scale training mitigates regional biases in global datasets, and (3) that basin-aware models can generalize to extreme events beyond the training distribution. Our experiments demonstrate that tree-based ensembles (XGBoost, Random Forest) consistently outperform more complex deep learning architectures, achieving median F1 scores of approximately 0.5 against OPERA DSWx-S1 reference data, performance that approaches the inherent uncertainty ceiling imposed by disagreement among remote sensing products themselves in settings with small, shallow, and intermittent water bodies. We find that features commonly assumed essential for operational flood forecasting (i.e., coincident river-basin streamflow, Height Above Nearest Drainage, and elevation) are neither sufficient nor strictly necessary for reliable prediction, with well-engineered meteorological and terrain features achieving comparable performance without explicit streamflow inputs. This challenges a core assumption underlying many current operational flood forecasting systems. Cross-basin transfer experiments reveal limited transferability, reinforcing the importance of basin-aware calibration. Further, models trained exclusively on non-extreme events produce directionally correct predictions for out-of-sample extremes, though with conservative bias (higher precision, lower recall). We suggest that a design philosophy prioritizing feature engineering and regional calibration over architectural complexity enables accessible deployment without sacrificing predictive skill.