Skills in sub-seasonal to seasonal terrestrial water storage forecasting: insights from the FEWS NET land data assimilation system
Abstract. Accurate prediction of terrestrial water storage (TWS), the sum of soil moisture, groundwater, snow/ice, and surface water, is critical for informing water resource management and disaster responses. In this study, we evaluated subseasonal to seasonal (S2S) TWS forecasts, produced by the FEWS NET land data assimilation system (FLDAS), over Africa using observations from the Gravity Recover and Climate Experiment (GRACE) and its Follow-On (GRACE/FO) mission. FLDAS consists of two advanced land surface models, Noah-MP and the NASA Catchment Land Surface Model (CLSM), both of which simulate key TWS components including groundwater. Results show that CLSM is more skillful in forecasting TWS anomalies at S2S scales than Noah-MP, with >0.6 relative operating characteristics (ROC) scores over more than half of the study domain across the 1–6 months lead times. CLSM forecasts also maintain stronger correlations with GRACE/FO data than Noah-MP, particularly at longer lead times, owing to more skillful reanalysis-based initial conditions and stronger persistence in simulated TWS. In contrast, Noah-MP forecasts show weaker skill, especially in central Africa where the skill also declines rapidly with lead time.
Evaluation results show that accuracy of TWS forecasts is strongly influenced by precipitation interannual variability: forecasts driven by precipitation products with lower precipitation interannual variability are generally more accurate than those driven by higher precipitation variability. The performance gap between Noah-MP and CLSM is also more pronounced in regions with higher precipitation variability such as central Africa. This sensitivity arises because TWS often exhibits strong multi-year variability in responses to interannual precipitation, making realistic simulation of long-term variability critical for skillful TWS forecasts. The superior performance of CLSM is attributed to its strong representation of upward groundwater movement, especially during prolonged droughts, which enhances TWS interannual variability. In contrast, the weak representation of capillary rise in Noah-MP limits its ability to capture effects of long-term precipitation variability on TWS. Both models exhibit lower correlation and higher RMSEs when evaluated against GRACE/FO data than relative to reanalysis, further underscoring substantial uncertainty in model physics.
Autocorrelation analyses show that TWS persistence is closely linked to groundwater persistence. CLSM groundwater exhibits stronger persistence than that of Noah-MP, owing to its ability to simulate groundwater responses to long-term precipitation variability. While persistence provides an important source of predictability, our results also show that inaccurate persistence, such as that associated with anthropogenically induced trends and changes in precipitation that are often inadequately captured by land surface models, can degrade forecast skill. These findings underscore the importance of using independent datasets such as GRACE/FO observations to evaluate TWS forecasts.
Competing interests: At least one of the (co-)authors is a member of the editorial board of Hydrology and Earth System Sciences.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Summary: the authors offer an evaluation of FEWSNET S2S terrestrial water storage forecasts for Africa. The manuscript focuses on differences between the two land surface models included in the FEWSNET forecast ensemble--CLSM and Noah-MP--and offers commentary on the performance of each. Overall, they conclude that CLSM offers advantages when simulating and forecasting TWS. Results also show how various NMME meteorological S2S forecasts compare, but these results are not emphasized in the discussion. The primary source of evaluation data in the main text is GRACE, while information on precipitation forecasts is contained in supplementary material and is addressed only briefly in the text.
I find the results presented in the manuscript to be interesting, and the explanation of these results is generally quite clear and useful. I did find myself a bit confused at times, when the authors bounced between comparing hindcasts to reanalysis and comparing hindcasts to GRACE observations, and when some of the explanation of geographic patterns seemed to me to be speculative. But these are minor points, and I have only a few questions that I would like to see addressed before the paper is published in final form.
Specific comments:
Line 204: isn't the 1m CLSM "soil depth" a choice that was made by the authors? This implementation of the model might output 1m soil moisture, but the model also has an implicit soil water profile that could be used to extract an estimate of total soil moisture integrated to any depth. Similarly (and maybe more easily) the authors could have used 1m soil moisture from Noah-MP rather than the full 2m column. Why not compare 1m CLSM to 1m Noah-MP, or 2m CLSM to 2m Noah-MP?
Lines 234-249: In Figure 2, the reanalysis errors look almost identical to the forecast errors for both Noah-MP and CLSM. Yet the authors invoke NMME uncertainties when explaining some aspects of model errors. Given that the patterns and magnitude of error appear to be very similar in reanalysis and in forecasts at all lead times, aren't these errors more about model bias than about forecasts? Even the explanations that invoke interannual climate variability seem like they'd need more evidence in their support, since we'd want to know that errors in interannual meteorological variability are seen in a similar way in both CHIRPS (or MERRA-2) and in the NMME models.
Line 285: If these results compare model forecasts to their own reanalysis, can we really say that degradation of Noah-MP forecasts is due to an "inability" to simulate long-term TWS variability? Couldn't we just as easily say that the persistence of CLSM forecasts is due to that model's "inability" to simulate rapid runoff and drainage? Without an independent evaluation dataset (for this specific result) it's not possible to know which model's behavior is better. That said, the subsequent results that *do* offer comparison with GRACE make a more convincing case. I would recommend that the authors avoid making statements about the quality of model performance when using the retrospective simulations as the truth. (In fact, they might consider moving these statements out of this section, as I admit that I was confused on my first reading about which statements had an observational basis and which were about simulation comparisons.)
Section 3.4: Why aren't any GRACE comparisons offered in this section? It seems odd to show the forecast without any evaluation.