the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
From Forecast to Alert: Designing an AI-Driven Flood Early Warning System for the White Volta Basin Using Open Satellite Data
Abstract. Flood early warning in the White Volta Basin of northern Ghana is complicated by unmonitored dam releases from Burkina Faso’s Bagre Reservoir, which existing globally calibrated systems do not account for. We present an end-to-end AI-driven flood early warning system built entirely from open satellite data. An ensemble of Random Forest, XGBoost, and LSTM models trained on GRDC discharge, CHIRPS rainfall, ERA5-Land reanalysis, and a novel JRC-derived Bagre storage proxy achieved Kling-Gupta Efficiency scores of 0.984, 0.974, and 0.957 at 1-, 3-, and 5-day lead times on an independent test period, exceeding the GloFAS v2.1 African median benchmark of approximately 0.35, though direct comparison against GloFAS v4 at Nawuni was not undertaken. A four-tier alert system calibrated to 30-year flood return periods achieved a cross-validated Red-tier probability of detection of 0.902 (false alarm ratio 0.134) at one-day lead, declining to 0.762 at five days; higher-tier skill rests on leave-one-year-out cross-validation rather than held-out evidence, as the test period contains no Orange or Red events. Sentinel-1 SAR mapping confirmed that threshold exceedances correspond to observed inundation extents of 50 to 149 km². The system integrates into Ghana's existing myDEWETRA-VOLTALARM platform without requiring new institutional infrastructure.
- Preprint
(3390 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 30 Jun 2026)
-
RC1: 'Comment on egusphere-2026-2168', Anonymous Referee #1, 31 May 2026
reply
# Overall AssessmentThis manuscript presents an AI-driven flood early warning framework for the White Volta Basin in northern Ghana, with particular attention to the role of Bagre Dam releases. The topic is important and operationally relevant. The study has several strengths: it focuses on a data-scarce transboundary basin, combines gauge discharge, satellite rainfall, reanalysis variables, and a reservoir-state proxy, translates discharge forecasts into actionable alert tiers, and attempts to connect forecast thresholds with Sentinel-1 inundation evidence and Ghana's existing warning infrastructure. In my view, the study has publication potential. However, several important issues need to be addressed before the manuscript can be considered for acceptance. The current algorithmic positioning is too narrow for a paper framed as an "AI-driven" warning system, the strongest performance claims are not supported by direct operational benchmarks or independent high-tier flood events, and some statements about open satellite data and deployment readiness are overstated. My overall recommendation is **major revision**.# Major Comments1. The literature review and algorithmic positioning should be substantially strengthened. The manuscript currently positions the method mainly relative to Random Forest, XGBoost, LSTM, previous White Volta machine-learning studies, and global hydrological AI models. This is too narrow for a paper framed as an AI-driven flood early warning system. Recent work on neural-network-based flood forecasting has moved quickly, including high-resolution spatiotemporal flood nowcasting, graph-based rainfall-runoff routing, neural-operator-based flood dynamics prediction, domain adaptation, and zero-shot or transfer-based high-resolution generalization. Even if the authors intentionally choose a lightweight basin-scale discharge-to-alert architecture, they should position this choice against recent deep learning and neural operator methods. I encourage the authors to discuss recent representative studies such as latent autoregressive neural network [1], deep neural operator [2, 3], geometry-informed neural operators for rapid flood forecasting [4], and LSTM-GNN routing models [5]. In particular, the revised manuscript would be stronger if the authors:(a) explain clearly that the present study predicts station discharge and alert tiers rather than high-resolution distributed inundation fields;(b) clarify why a tabular ensemble plus LSTM is appropriate for this operational setting, despite recent progress in neural operators, graph neural networks, and high-resolution spatiotemporal nowcasting;(c) discuss whether the proposed framework could be coupled with distributed inundation models to convert gauge-level alerts into community-scale depth, extent, road, or asset-impact warnings;(d) avoid implying that the proposed model represents the current frontier of AI-based flood forecasting without acknowledging these newer algorithmic directions.Suggested References:[1] Cao, X., Wang, B., Yao, Y., Zhang, L., Xing, Y., Mao, J., Zhang, R., Fu, G., Borthwick, A. G. L., and Qin, H. (2025). U-RNN high-resolution spatiotemporal nowcasting of urban flooding. *Journal of Hydrology*, 659, 133117. https://doi.org/10.1016/j.jhydrol.2025.133117[2] Cao, X., Yao, Y., Wang, Z., Zhao, Z., Borthwick, A. G. L., and Qin, H. (2026). Large-scale urban flood modeling and zero-shot high-resolution generalization with LarNO. *Journal of Hydrology*, 676, 135686. https://doi.org/10.1016/j.jhydrol.2026.135686[3] Xu, Q., De Vos, L. F., Shi, Y., Ruther, N., Bronstert, A., and Zhu, X. X. (2025). Urban flood modeling and forecasting with deep neural operator and transfer learning. *Journal of Hydrology*, 661, 133705. https://doi.org/10.1016/j.jhydrol.2025.133705[4] Taghizadeh, M., Zandsalimi, Z., Nabian, M. A., Goodall, J. L., and Alemazkoor, N. (2026). FloodForecaster: A domain-adaptive geometry-informed neural operator framework for rapid flood forecasting. *Journal of Hydrology*, 664, 134512. https://doi.org/10.1016/j.jhydrol.2025.134512[5] Mosaffa, H., Pappenberger, F., Prudhomme, C., Chantry, M., Rudiger, C., and Cloke, H. (2026). A GNN routing module is all you need for LSTM Rainfall-Runoff models. *Hydrology and Earth System Sciences*, 30, 2079-2092. https://doi.org/10.5194/hess-30-2079-20261. The statement that the system is built "entirely from open satellite data" is misleading. The model depends strongly on GRDC observed discharge and discharge-lag features. The manuscript reports that, in the one-day Random Forest model, the three-day rolling mean discharge accounts for 86.0% of total Gini importance and one-day lagged discharge accounts for another 13.2%, while the remaining 19 features, including CHIRPS rainfall, ERA5-Land variables, and the Bagre storage index, contribute less than 2% collectively. This means the model is primarily a gauge-conditioned discharge forecasting system augmented with satellite and reanalysis variables, not a system built entirely from satellite data. The title, abstract, and conclusions should be revised to reflect this more accurately. The authors should also discuss how the system would perform when real-time Nawuni discharge is missing, delayed, or erroneous.2. The evaluation of Orange- and Red-tier alert skill is not yet strong enough. The independent test period is January 2004 to February 2007, and the manuscript states that the maximum test-period discharge is only 1,206 m3/s, below the Orange threshold of 1,362 m3/s. Therefore, the reported Orange- and Red-tier skill rests entirely on leave-one-year-out cross-validation rather than a strictly held-out extreme-flood sample. LOYO cross-validation is useful, but daily flood samples are strongly autocorrelated, and the reported 122 Red-tier "events" may represent clustered flood days rather than independent flood events. The authors should add event-based evaluation, block or event-level cross-validation, uncertainty intervals for POD/FAR/CSI, and a clear count of independent flood episodes. The abstract and conclusion should state more explicitly that high-tier skill is not independently tested on held-out Orange or Red events.3. A direct benchmark against the current operational or state-of-the-art baseline is needed. The comparison with a published GloFAS v2.1 African median KGE is only contextual and cannot support claims that the proposed system outperforms the operational forecasting system at Nawuni. The relevant comparison is GloFAS v4 or the actual myDEWETRA-VOLTALARM forecast product at the same gauge, period, lead times, and metrics. The explanation that GloFAS GRIB2 files could not be read because of an unavailable ecCodes installation is not sufficient for a publication-level performance claim. If a direct GloFAS v4 comparison cannot be added, the authors should substantially tone down all comparative claims and present the GloFAS result only as background context.4. The Bagre reservoir proxy is promising but its operational value remains under-validated. The manuscript presents the JRC-derived Bagre storage proxy as a key innovation, but the proxy is monthly and linearly interpolated to daily values, which introduces a lag of several weeks relative to real-time reservoir conditions. This weakens its use as an operational trigger. The authors should quantify the independent skill of the Bagre trigger itself, including POD, FAR, lead time, missed events, and false alerts across flood and non-flood years. They should also distinguish clearly between the retrospective value of the proxy and what would be available in a real-time operational system. If near-real-time Sentinel-1 reservoir monitoring or altimetry is required, this should be described as a future extension rather than as part of the currently validated system.5. The Sentinel-1 validation should be described more cautiously. The Sentinel-1 analysis is useful as spatial evidence that high-discharge periods correspond to observable inundation, but it does not validate the lead-time skill of the forecast system. Only three post-2007 events are considered, there is no independent flood-map accuracy assessment, the discharge during these events is represented by an ERA5-Land runoff proxy rather than observed gauge data, and the 2019 SAR maximum is separated from the DFO event date by 52 days. The authors should avoid phrasing that suggests Sentinel-1 "confirms" the full warning system. A more accurate statement would be that Sentinel-1 provides limited spatial grounding for the discharge-threshold framework.6. Reproducibility should be improved. The Zenodo archive is helpful, but the released scripts appear to use hard-coded local paths, and the workflow is not yet presented as a clean, portable reproduction package. Some archived auxiliary files also contain formatting or placeholder issues. The authors should provide a complete requirements or environment file, a clear run order, portable path handling, processed feature matrices or scripts to reproduce them, and the exact outputs used for the tables and figures. For an operational AI warning-system paper, reproducibility of the data-processing and evaluation chain is especially important.# Minor Comments1. The title may be too broad. If the model remains station-discharge based, the title should avoid implying a fully distributed satellite-only flood early warning system.2. The abstract should be toned down. The reported KGE values are impressive, but they are computed over a moderate-flow test period without Orange or Red events. This caveat should appear immediately next to the performance claim.3. The use of "probabilistic" should be clarified. The ensemble is described as a simple unweighted average of three deterministic models. If no predictive distribution or calibrated forecast probability is produced, the manuscript should avoid calling the discharge forecasts probabilistic, or should explain how probabilities are derived.4. The alert thresholds need clearer uncertainty treatment. The LP3 fit is based on only 30 wet-season annual maxima, and the Red threshold is operationally important. Confidence intervals for threshold estimates would help emergency managers understand the robustness of the tier boundaries.5. The manuscript should distinguish more carefully between return-period thresholds and decision thresholds. A 5-year discharge threshold may not automatically correspond to an evacuation threshold without vulnerability, exposure, evacuation feasibility, and impact data.6. The authors should clarify whether missing GRDC discharge values were gap-filled, omitted, or handled differently across feature construction, lag generation, and model evaluation.ReplyCitation: https://doi.org/
10.5194/egusphere-2026-2168-RC1 -
AC1: 'Reply on RC1', Joseph Obeng, 05 Jun 2026
reply
We thank the referee for the thorough and constructive review. A detailed point-by-point response addressing all major and minor comments: including new analytical results, full inserted manuscript text, and reproduced tables is attached as a supplementary file.
-
AC1: 'Reply on RC1', Joseph Obeng, 05 Jun 2026
reply
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 134 | 24 | 11 | 169 | 8 | 7 |
- HTML: 134
- PDF: 24
- XML: 11
- Total: 169
- BibTeX: 8
- EndNote: 7
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1