the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Assessment of machine learning-based approaches to improve sub-seasonal to seasonal forecasting of precipitation in Senegal
Abstract. In Senegal, the West African monsoon (WAM) season is characterized by pronounced subseasonal to seasonal (S2S) rainfall fluctuations in response to complex interactions between large-scale atmospheric and oceanic variability patterns and mesoscale convective systems. Indeed, the general circulation models (GCMs) used in the development of S2S forecasting systems often struggle to represent the mechanisms yielding WAM predictability. This study explores the potential of machine learning (ML) approaches to improve S2S precipitation forecasting in Senegal. We evaluate a set of ML models, including ridge regression, linear regression, random forest, support vector machine, Adaboost, and multilayer perceptron for S2S forecasting of precipitation during the monsoon season. To this aim, we use a combination of high-resolution global precipitation estimates from ground and satellite observations, along with atmospheric and oceanic reanalysis products. Our methodology relies on a non-filtering approach to extract significant S2S signals as predictors, enabling real-time application. We demonstrate that integrating different predictor variables from a range of atmospheric and oceanic fields significantly enhances prediction skill. Notably, the ridge regression model outperforms state-of-the-art GCM-derived S2S predictions. The study highlights the potential for developing operational S2S forecasting systems for West African precipitation using ML techniques to complement GCM-based forecast systems, offering valuable tools for climate risk anticipation and water resource management. Such ML-based systems not only provide skillful predictions but are also computationally more efficient compared to GCMs, and can be extended to diverse climatic zones.
- Preprint
(5041 KB) - Metadata XML
-
Supplement
(3076 KB) - BibTeX
- EndNote
Status: open (extended)
-
RC1: 'Comment on egusphere-2024-4040', Anonymous Referee #1, 01 May 2025
reply
This paper makes an attempt at exploring predictability of West African Monsoon rainfall over Senegal. Specifically, the authors have focused on weekly rainfall prediction, based on ocean and atmosphere-based predictors observed at different lead-periods (0 to 5 weeks). They have carried out correlational analysis of different predictors with the target variable (precipitation), and identified the most significant predictors. They have then compared a number of standard ML models/algorithms like LR, ridge regression, MLP and SVM and compared their predictive accuracies of weekly rainfall against those of operational numerical forecast models like NCEP, ECMWF, UKMO, and identify ridge regression as most successful.
While I appreciate the general aims and overall approach of the work, I think some improvements are necessary to make the work more sound:
1) The work focuses on weekly rainfall. Are we considering non-overlapping weeks (according to calendar) or overlapping weeks (7-day sliding windows)? I think the second one is a better approach. Also, in different regions, the variability is best captured by considering windows of different sizes. The authors may consider 5-day or 10-day windows to see if the results improve
2) The spatial maps of Fig 8-10 are good for visualization, but they are difficult to compare across different lead weeks. For example, in Fig 8, 9 etc the spatial maps for different weeks look almost the same. There should be some objective/numerical measure to compare them.
3) According to the tables in Fig 7, the predictability comes mainly from OLR. Other predictors seem to add very little or no value, as the combined impact of all predictors mirrors the individual impact of OLR for all lead times. Only SST and U200 at 5-weeks lead have some significant score in terms of correlation (not MAE). But as far as I know, OLR is a derived variable that is essentially a proxy for cloud cover. Hence, OLR's relation with precipitation is largely associational, and I doubt if it can be considered as a predictor in the "forecasting" sense. So, I feel the results of Fig 7 are rather weak.
4) The predictive models considered in this study, like SVM and ridge regression are outdated in current ML research due to their limited predictive power as they cannot represent the kind of complex functions that may arise in the natural sciences. While such models are still useful when we are predicting independently for each location, a better approach may be to jointly predict the precipitation for the entire region (all locations) jointly. For that, we may prefer to use deep neural networks like CNNs.
Citation: https://doi.org/10.5194/egusphere-2024-4040-RC1 -
AC1: 'Reply on RC1', Dioumacor FAYE, 17 May 2025
reply
We sincerely thank you for your thorough review of our manuscript and for your insightful suggestions.
1- Regarding your question, we confirm that the weeks considered in our study are indeed defined as sliding 7-day windows, rather than fixed calendar weeks. This approach was chosen to better capture the intra-seasonal variability of rainfall. Furthermore, we fully agree with your comment about exploring different window sizes (such as 5 or 10 days). This is indeed a very relevant suggestion, especially for capturing regional variability more effectively. We plan to consider this as a perspective for future work and potential methodological improvement.2- We acknowledge that these maps may appear visually similar across successive lead times, which can make interpretation challenging. To address this, we have performed regional averaging of the results as a function of forecast lead time, which is presented in the figures comparing machine learning models with S2S models (Figure 11). This approach was intended to provide a clearer and more quantitative comparison across different models and lead times.
3- We confirm that OLR emerges as the most influential predictor in our results, which can be attributed to its strong statistical association with convection and rainfall.
Although OLR is often considered a proxy for cloud cover, its relationship with precipitation is not merely associative. It reflects fundamental physical mechanisms, particularly in tropical regions. Low OLR values typically indicate the presence of cold, high cloud tops, characteristic of deep convective systems that produce intense rainfall. This connection is grounded in the thermodynamics of the tropical atmosphere, where deep convection is tightly linked to heavy precipitation. Moreover, in a machine learning framework, the dominance of a single predictor does not necessarily imply that others are irrelevant. Additional predictors may still contribute meaningfully through interactions or in specific spatiotemporal contexts. In the revised version of the manuscript, we plan to incorporate a SHAP (SHapley Additive exPlanations) analysis to more thoroughly assess the contextual contribution of each predictor and their interactions within the models.4- In this study, we deliberately began with simple and computationally inexpensive models, such as linear regression, ridge regression, Random Forest, and SVM, in order to establish an initial analytical framework that is interpretable, reproducible, and suited to the computational resources available to us.
We are aware that these approaches have limitations when it comes to capturing the complexity of nonlinear relationships in high-dimensional environmental data. That is why we initially applied statistical computations to help reduce this complexity, as illustrated by Equations (1) to (4) in the manuscript. As part of our ongoing work, we have already started exploring deep learning approaches, particularly convolutional neural networks (CNNs), which provide a suitable framework for joint and spatial modeling of precipitation at the regional scale. Moreover, our results show that in this specific case, simple regression models (such as ridge and linear regression) have often outperformed more complex models (such as Random Forest or AdaBoost), which highlights the importance of aligning model complexity with the structure of the available data.
We will include this discussion in the revised version of the manuscript and will continue evaluating more advanced models in our future research.Citation: https://doi.org/10.5194/egusphere-2024-4040-AC1
-
AC1: 'Reply on RC1', Dioumacor FAYE, 17 May 2025
reply
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
281 | 57 | 16 | 354 | 15 | 14 | 15 |
- HTML: 281
- PDF: 57
- XML: 16
- Total: 354
- Supplement: 15
- BibTeX: 14
- EndNote: 15
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1