Assessment of machine learning-based approaches to improve sub-seasonal to seasonal forecasting of precipitation in Senegal

Faye, Dioumacor; de Andrade, Felipe M.; Suárez-Moreno, Roberto; Wane, Dahirou; Hegglin, Michaela I.; Dieng, Abdou L.; Kaly, François; Lguensat, Redouane; Gaye, Amadou T.

doi:10.5194/egusphere-2024-4040

Preprints

https://doi.org/10.5194/egusphere-2024-4040

Preprints

28 Feb 2025

| 28 Feb 2025

Assessment of machine learning-based approaches to improve sub-seasonal to seasonal forecasting of precipitation in Senegal

Dioumacor Faye, Felipe M. de Andrade, Roberto Suárez-Moreno, Dahirou Wane, Michaela I. Hegglin, Abdou L. Dieng, François Kaly, Redouane Lguensat, and Amadou T. Gaye

Abstract. In Senegal, the West African monsoon (WAM) season is characterized by pronounced subseasonal to seasonal (S2S) rainfall fluctuations in response to complex interactions between large-scale atmospheric and oceanic variability patterns and mesoscale convective systems. Indeed, the general circulation models (GCMs) used in the development of S2S forecasting systems often struggle to represent the mechanisms yielding WAM predictability. This study explores the potential of machine learning (ML) approaches to improve S2S precipitation forecasting in Senegal. We evaluate a set of ML models, including ridge regression, linear regression, random forest, support vector machine, Adaboost, and multilayer perceptron for S2S forecasting of precipitation during the monsoon season. To this aim, we use a combination of high-resolution global precipitation estimates from ground and satellite observations, along with atmospheric and oceanic reanalysis products. Our methodology relies on a non-filtering approach to extract significant S2S signals as predictors, enabling real-time application. We demonstrate that integrating different predictor variables from a range of atmospheric and oceanic fields significantly enhances prediction skill. Notably, the ridge regression model outperforms state-of-the-art GCM-derived S2S predictions. The study highlights the potential for developing operational S2S forecasting systems for West African precipitation using ML techniques to complement GCM-based forecast systems, offering valuable tools for climate risk anticipation and water resource management. Such ML-based systems not only provide skillful predictions but are also computationally more efficient compared to GCMs, and can be extended to diverse climatic zones.

Received: 19 Dec 2024 – Discussion started: 28 Feb 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 5041 KB)

Supplement (3076 KB)

Download & links

Dioumacor Faye, Felipe M. de Andrade, Roberto Suárez-Moreno, Dahirou Wane, Michaela I. Hegglin, Abdou L. Dieng, François Kaly, Redouane Lguensat, and Amadou T. Gaye

Status: closed (peer review stopped)

RC1:
'Comment on egusphere-2024-4040', Anonymous Referee #1, 01 May 2025

This paper makes an attempt at exploring predictability of West African Monsoon rainfall over Senegal. Specifically, the authors have focused on weekly rainfall prediction, based on ocean and atmosphere-based predictors observed at different lead-periods (0 to 5 weeks). They have carried out correlational analysis of different predictors with the target variable (precipitation), and identified the most significant predictors. They have then compared a number of standard ML models/algorithms like LR, ridge regression, MLP and SVM and compared their predictive accuracies of weekly rainfall against those of operational numerical forecast models like NCEP, ECMWF, UKMO, and identify ridge regression as most successful.
While I appreciate the general aims and overall approach of the work, I think some improvements are necessary to make the work more sound:
1) The work focuses on weekly rainfall. Are we considering non-overlapping weeks (according to calendar) or overlapping weeks (7-day sliding windows)? I think the second one is a better approach. Also, in different regions, the variability is best captured by considering windows of different sizes. The authors may consider 5-day or 10-day windows to see if the results improve
2) The spatial maps of Fig 8-10 are good for visualization, but they are difficult to compare across different lead weeks. For example, in Fig 8, 9 etc the spatial maps for different weeks look almost the same. There should be some objective/numerical measure to compare them.
3) According to the tables in Fig 7, the predictability comes mainly from OLR. Other predictors seem to add very little or no value, as the combined impact of all predictors mirrors the individual impact of OLR for all lead times. Only SST and U200 at 5-weeks lead have some significant score in terms of correlation (not MAE). But as far as I know, OLR is a derived variable that is essentially a proxy for cloud cover. Hence, OLR's relation with precipitation is largely associational, and I doubt if it can be considered as a predictor in the "forecasting" sense. So, I feel the results of Fig 7 are rather weak.
4) The predictive models considered in this study, like SVM and ridge regression are outdated in current ML research due to their limited predictive power as they cannot represent the kind of complex functions that may arise in the natural sciences. While such models are still useful when we are predicting independently for each location, a better approach may be to jointly predict the precipitation for the entire region (all locations) jointly. For that, we may prefer to use deep neural networks like CNNs.

Citation: https://doi.org/10.5194/egusphere-2024-4040-RC1
- AC1: 'Reply on RC1', Dioumacor FAYE, 17 May 2025
  
  We sincerely thank you for your thorough review of our manuscript and for your insightful suggestions.
  
  1- Regarding your question, we confirm that the weeks considered in our study are indeed defined as sliding 7-day windows, rather than fixed calendar weeks. This approach was chosen to better capture the intra-seasonal variability of rainfall. Furthermore, we fully agree with your comment about exploring different window sizes (such as 5 or 10 days). This is indeed a very relevant suggestion, especially for capturing regional variability more effectively. We plan to consider this as a perspective for future work and potential methodological improvement.
  2- We acknowledge that these maps may appear visually similar across successive lead times, which can make interpretation challenging. To address this, we have performed regional averaging of the results as a function of forecast lead time, which is presented in the figures comparing machine learning models with S2S models (Figure 11). This approach was intended to provide a clearer and more quantitative comparison across different models and lead times.
  3- We confirm that OLR emerges as the most influential predictor in our results, which can be attributed to its strong statistical association with convection and rainfall.
  
  Although OLR is often considered a proxy for cloud cover, its relationship with precipitation is not merely associative. It reflects fundamental physical mechanisms, particularly in tropical regions. Low OLR values typically indicate the presence of cold, high cloud tops, characteristic of deep convective systems that produce intense rainfall. This connection is grounded in the thermodynamics of the tropical atmosphere, where deep convection is tightly linked to heavy precipitation. Moreover, in a machine learning framework, the dominance of a single predictor does not necessarily imply that others are irrelevant. Additional predictors may still contribute meaningfully through interactions or in specific spatiotemporal contexts. In the revised version of the manuscript, we plan to incorporate a SHAP (SHapley Additive exPlanations) analysis to more thoroughly assess the contextual contribution of each predictor and their interactions within the models.
  4- In this study, we deliberately began with simple and computationally inexpensive models, such as linear regression, ridge regression, Random Forest, and SVM, in order to establish an initial analytical framework that is interpretable, reproducible, and suited to the computational resources available to us.
  
  We are aware that these approaches have limitations when it comes to capturing the complexity of nonlinear relationships in high-dimensional environmental data. That is why we initially applied statistical computations to help reduce this complexity, as illustrated by Equations (1) to (4) in the manuscript. As part of our ongoing work, we have already started exploring deep learning approaches, particularly convolutional neural networks (CNNs), which provide a suitable framework for joint and spatial modeling of precipitation at the regional scale. Moreover, our results show that in this specific case, simple regression models (such as ridge and linear regression) have often outperformed more complex models (such as Random Forest or AdaBoost), which highlights the importance of aligning model complexity with the structure of the available data.
  
  We will include this discussion in the revised version of the manuscript and will continue evaluating more advanced models in our future research.
  
  Citation: https://doi.org/10.5194/egusphere-2024-4040-AC1
RC2:
'Comment on egusphere-2024-4040', Anonymous Referee #2, 10 Sep 2025
This manuscript presents an assessment of machine learning approaches for subseasonal to seasonal (S2S) precipitation forecasting in Senegal, comparing six ML algorithms against operational S2S models. While the research addresses an important problem in West African climate prediction, the study contains several fundamental methodological flaws that severely restrict its contribution to the field.
The absence of proper baseline comparisons, uninterpretable evaluation metrics, and limited methodological scope render the main claims unsupported by the evidence presented. However, the authors could potentially address these issues through major revisions for future submission.
Major Issues
Inadequate Evaluation Framework and Missing Baseline Comparisons

The most critical flaw in this study is the inadequate evaluation framework that makes it impossible to assess whether the ML models provide genuine forecasting skill. Several fundamental problems undermine the evaluation:
Meaningless Performance Metrics:
The reported MAE values (0.5-0.8) lack proper context and are essentially uninterpretable

Without knowledge of typical precipitation anomaly magnitudes and variability in Senegal, readers cannot assess whether these values represent good or poor performance

The absence of skill scores or normalized error measures prevents determination of practical forecast value

An MAE of 0.7 could represent excellent or terrible performance depending on the natural variability of the system

Missing Baseline Comparisons: The study fails to compare against fundamental baseline forecasts, including:
Climatological forecasts (seasonal mean precipitation)

Persistence models (assuming recent conditions continue)

Simple statistical models (e.g., regression against climate indices)

Random forecasts

Without these comparisons, there is no evidence that the ML models provide any skill beyond naive forecasting methods.
Limited Discriminatory Power: The fact that differences between linear regression and other methods are not statistically significant suggests that the power of machine learning has not been fully utilized. This indicates that simpler approaches would likely yield performance metrics in the same range as the reported results.
Methodological Weaknesses in Predictor Selection and Feature Engineering

Several aspects of the ML methodology raise significant concerns:
Predictor Analysis - Strengths and Weaknesses:
Strength: The detailed linear correlation analysis with important predictor variables for S2S forecasting (such as sea surface temperatures, atmospheric circulation patterns, and teleconnection indices) represents a thorough foundation for understanding climate relationships in the West African region

Weakness: However, over-reliance on linear correlation analysis may miss important non-linear climate relationships crucial for understanding monsoon dynamics

Critical Gap: While the correlation analysis is comprehensive, it remains unclear how features derived from these correlations are actually extracted and formatted as inputs for the ML models

The exact dimensionality and structure of inputs to ML models remains unclear

Ambiguity regarding temporal input structure: Are you using only Week 2 forecasts, or Weeks 2-5? What are the actual model inputs?

The transition from correlation patterns to ML model features lacks transparency

Insufficient Feature Engineering:
The covariance-based aggregation method needs better justification and comparison with alternative approaches

Lack of discussion on feature scaling, multicollinearity assessment, or advanced feature selection techniques

No exploration of techniques that could improve model performance

Validation Concerns:
While the non-filtering method for extracting intraseasonal signals is practical for real-time applications, its effectiveness compared to established bandpass filtering methods requires thorough validation

Missing computational details regarding processing time, computational requirements, and practical implementation considerations

Missing Technical Details
Insufficient discussion of computational requirements

Lack of processing time comparisons between ML and GCM approaches

Limited practical implementation considerations

Other Points:
Many aspects of well-known ML models are described unnecessarily in the main text

These methodological details could be moved to supplementary information

The main paper should focus more directly on results and novel contributions

The figures are difficult to read and interpret.

Overall, while this research addresses an important regional climate prediction challenge, the current manuscript requires substantial methodological improvements before it can make a meaningful contribution to the field.
Citation: https://doi.org/10.5194/egusphere-2024-4040-RC2
- AC1: 'Reply on RC1', Dioumacor FAYE, 17 May 2025
  
  We sincerely thank you for your thorough review of our manuscript and for your insightful suggestions.
  
  1- Regarding your question, we confirm that the weeks considered in our study are indeed defined as sliding 7-day windows, rather than fixed calendar weeks. This approach was chosen to better capture the intra-seasonal variability of rainfall. Furthermore, we fully agree with your comment about exploring different window sizes (such as 5 or 10 days). This is indeed a very relevant suggestion, especially for capturing regional variability more effectively. We plan to consider this as a perspective for future work and potential methodological improvement.
  2- We acknowledge that these maps may appear visually similar across successive lead times, which can make interpretation challenging. To address this, we have performed regional averaging of the results as a function of forecast lead time, which is presented in the figures comparing machine learning models with S2S models (Figure 11). This approach was intended to provide a clearer and more quantitative comparison across different models and lead times.
  3- We confirm that OLR emerges as the most influential predictor in our results, which can be attributed to its strong statistical association with convection and rainfall.
  
  Although OLR is often considered a proxy for cloud cover, its relationship with precipitation is not merely associative. It reflects fundamental physical mechanisms, particularly in tropical regions. Low OLR values typically indicate the presence of cold, high cloud tops, characteristic of deep convective systems that produce intense rainfall. This connection is grounded in the thermodynamics of the tropical atmosphere, where deep convection is tightly linked to heavy precipitation. Moreover, in a machine learning framework, the dominance of a single predictor does not necessarily imply that others are irrelevant. Additional predictors may still contribute meaningfully through interactions or in specific spatiotemporal contexts. In the revised version of the manuscript, we plan to incorporate a SHAP (SHapley Additive exPlanations) analysis to more thoroughly assess the contextual contribution of each predictor and their interactions within the models.
  4- In this study, we deliberately began with simple and computationally inexpensive models, such as linear regression, ridge regression, Random Forest, and SVM, in order to establish an initial analytical framework that is interpretable, reproducible, and suited to the computational resources available to us.
  
  We are aware that these approaches have limitations when it comes to capturing the complexity of nonlinear relationships in high-dimensional environmental data. That is why we initially applied statistical computations to help reduce this complexity, as illustrated by Equations (1) to (4) in the manuscript. As part of our ongoing work, we have already started exploring deep learning approaches, particularly convolutional neural networks (CNNs), which provide a suitable framework for joint and spatial modeling of precipitation at the regional scale. Moreover, our results show that in this specific case, simple regression models (such as ridge and linear regression) have often outperformed more complex models (such as Random Forest or AdaBoost), which highlights the importance of aligning model complexity with the structure of the available data.
  
  We will include this discussion in the revised version of the manuscript and will continue evaluating more advanced models in our future research.
  
  Citation: https://doi.org/10.5194/egusphere-2024-4040-AC1
- AC2: 'Reply on RC2', Dioumacor FAYE, 12 Sep 2025
  
  Comment 1: MAE not interpretable without context on natural variability.
  
  The manuscript defines the MAE in Section 2.2.4 (Eq. 8) and uses it to evaluate performance (Figure 6). In order to contextualize the predictive performance of the models and provide essential reference points, we calculated the standard deviation of the observed precipitation anomalies and normalized the error metrics to make them interpretable. Furthermore, we introduced a Skill Score (SS) based on MAE, defined as SS = 1 - (MAE_model / MAE_reference), using climatology as a benchmark.
  Comment 2: Little difference between linear regression and more complex methods (RF, MLP, etc.).
  
  Figure 6 (Section 3.2) indeed shows that Ridge regression and simple linear regression achieve the best scores, while more complex methods such as RF, SVM, or MLP display slightly lower performance. This finding, already highlighted in the manuscript, indicates that linear models (with regularization) dominate in our case.
  
  We nevertheless acknowledge that it is relevant to discuss the possible reasons for this limited gap. Several explanations can be put forward: on the one hand, the key atmospheric signals may be largely linear in nature.
  
  In addition, we emphasize that we applied statistical calculations beforehand to reduce data complexity, as illustrated by Equations (1)–(4) in the manuscript. This step likely contributes to explaining why linear models such as Ridge regression and simple regression perform better, while more sophisticated non-linear models do not provide significant additional gains in this context.
  
  We will expand on these points in the revised version to strengthen the discussion, without altering the presented results.
  Comment 3: Excessive use of linear correlations; lack of clarity on the structure of predictors and input data. We appreciate this remark. We have clarified that the correlation analysis conducted in Sect. 2.2.2 primarily serves an exploratory and illustrative purpose (Figs. 2–4), and that the final construction of predictors does not rely solely on these correlations. Following Li et al. (2022), we use a method based on spatio-temporal covariance. We have added a more detailed explanation to the manuscript (see Eqs. 4–5 in Sect. 2.2.2): each predictor 𝑋𝑘 is defined as the sum of the “10–60 j” signals weighted by their covariance with precipitation in Senegal. Formally, in Equation (5), the sum is taken only over the grid points 𝑖 where the correlation is significant at the 5% level. This procedure ensures that only signals genuinely related to Senegalese precipitation (and not random “noise”) are aggregated, reducing uncertainty in the choice of predictors. We have clarified this mechanism and cited Li et al. (2022) to justify it, as well as the use of the statistical threshold. Section 2.2.3 already mentions that “a single predictor” is defined for each field and each previous week; we will further strengthen this explanation to clarify the dimension and structure of the input dataset.
  Comment 4: Weaknesses in feature engineering: no information on scaling, multicollinearity, or selection techniques.
  
  The current version does not explicitly specify the normalization of the input variables. We will address this by indicating that the ISO signals of atmospheric fields and precipitation were normalized using the Yeo and Johnson (2000) method, which ensures a more suitable distribution and a comparable scale across predictors. Regarding collinearity, Section 2.2.3 mentions that models such as Ridge regression are specifically used to handle multicollinearity among predictors. We will emphasize this point and add a sentence to clarify that, in addition, a significance-based filtering (correlations at the 5% level) is applied during signal selection. Finally, we will provide more detail on our predictor selection method (e.g., statistical threshold or other criteria), which is already implemented through the correlation analysis described in Sect. 2.2.2, in order to fully address this comment.
  Comment 5: Unfiltered validation to be compared with classical filtering methods (FFT, band-pass).
  
  The manuscript clearly describes our “non-filtering” approach in Section 2.2.1 (see excerpt below): we avoid traditional FFT filtering (which requires future information) to extract the 10–60 day signals, as explained by Li et al. (2022). We did not carry out a quantitative comparison with classical filters in this study. However, this suggestion is relevant. In the revised version, we will add a note explaining that our choice is motivated by real-time applicability (as emphasized in the manuscript), and we could consider performing or at least qualitatively discussing a test with a standard band-pass filter. If possible, we might also indicate expectations (e.g., FFT filtering could slightly improve accuracy but would not be usable in real time).
  Comment 6: Lack of information on computational costs / comparisons between GCM and ML.
  
  In the abstract, we briefly noted that ML methods are more computationally efficient than GCMs, but we did not quantify this. This point is valid: a simplified comparison (e.g., training time on our data vs. runtime of a GCM simulation) could be informative. We will add a qualitative discussion on this topic in the revision, specifying for instance that training an ML model (a few minutes/hours on a PC) is much faster than running a GCM simulation (multi-member ensembles on a supercomputer). At the very least, we will expand the text to make the abstract’s statement more explicit and cite references or benchmark figures on the order of magnitude of the costs.
  Comment 7: Figures are difficult to read, and the main text is overloaded with methodological details.
  
  We take note of this stylistic remark. We will improve the readability of the figures by enlarging fonts, simplifying captions, and revising color palettes or contrasts if necessary. Regarding the text, we will agree with the co-author to move some overly detailed explanations of the algorithms to the appendix or shorten them, in order to make the main version more fluid. We will also rephrase and condense certain methodological sections to lighten the manuscript without altering its substance.
  
  Citation: https://doi.org/10.5194/egusphere-2024-4040-AC2
- AC3: 'Reply on RC2', Dioumacor FAYE, 12 Sep 2025
  
  We sincerely thank you for your thorough review of our manuscript and for your insightful suggestions.
  Comment 1: MAE not interpretable without context on natural variability. The manuscript defines the MAE in Section 2.2.4 (Eq. 8) and uses it to evaluate performance (Figure 6). In order to contextualize the predictive performance of the models and provide essential reference points, we calculated the standard deviation of the observed precipitation anomalies and normalized the error metrics to make them interpretable. Furthermore, we introduced a Skill Score (SS) based on MAE, defined as SS = 1 - (MAE_model / MAE_reference), using climatology as a benchmark.
  Comment 2: Little difference between linear regression and more complex methods (RF, MLP, etc.). Figure 6 (Section 3.2) indeed shows that Ridge regression and simple linear regression achieve the best scores, while more complex methods such as RF, SVM, or MLP display slightly lower performance. This finding, already highlighted in the manuscript, indicates that linear models (with regularization) dominate in our case. We nevertheless acknowledge that it is relevant to discuss the possible reasons for this limited gap. Several explanations can be put forward: on the one hand, the key atmospheric signals may be largely linear in nature. In addition, we emphasize that we applied statistical calculations beforehand to reduce data complexity, as illustrated by Equations (1)–(4) in the manuscript. This step likely contributes to explaining why linear models such as Ridge regression and simple regression perform better, while more sophisticated non-linear models do not provide significant additional gains in this context. We will expand on these points in the revised version to strengthen the discussion, without altering the presented results.
  Comment 3: Excessive use of linear correlations; lack of clarity on the structure of predictors and input data. We appreciate this remark. We have clarified that the correlation analysis conducted in Sect. 2.2.2 primarily serves an exploratory and illustrative purpose (Figs. 2–4), and that the final construction of predictors does not rely solely on these correlations. Following Li et al. (2022), we use a method based on spatio-temporal covariance. We have added a more detailed explanation to the manuscript (see Eqs. 4–5 in Sect. 2.2.2): each predictor 𝑋_𝑘 is defined as the sum of the 10–60 j signals weighted by their covariance with precipitation in Senegal. Formally, in Equation (5), the sum is taken only over the grid points 𝑖 where the correlation is significant at the 5% level. This procedure ensures that only signals genuinely related to Senegalese precipitation (and not random “noise”) are aggregated, reducing uncertainty in the choice of predictors. We have clarified this mechanism and cited Li et al. (2022) to justify it, as well as the use of the statistical threshold. Section 2.2.3 already mentions that “a single predictor” is defined for each field and each previous week; we will further strengthen this explanation to clarify the dimension and structure of the input dataset.
  Comment 4: Weaknesses in feature engineering: no information on scaling, multicollinearity, or selection techniques. The current version does not explicitly specify the normalization of the input variables. We will address this by indicating that the ISO signals of atmospheric fields and precipitation were normalized using the Yeo and Johnson (2000) method, which ensures a more suitable distribution and a comparable scale across predictors. Regarding collinearity, Section 2.2.3 mentions that models such as Ridge regression are specifically used to handle multicollinearity among predictors. We will emphasize this point and add a sentence to clarify that, in addition, a significance-based filtering (correlations at the 5% level) is applied during signal selection. Finally, we will provide more detail on our predictor selection method (e.g., statistical threshold or other criteria), which is already implemented through the correlation analysis described in Sect. 2.2.2, in order to fully address this comment.
  Comment 5: Unfiltered validation to be compared with classical filtering methods (FFT, band-pass). The manuscript clearly describes our “non-filtering” approach in Section 2.2.1 (see excerpt below): we avoid traditional FFT filtering (which requires future information) to extract the 10–60 day signals, as explained by Li et al. (2022). We did not carry out a quantitative comparison with classical filters in this study. However, this suggestion is relevant. In the revised version, we will add a note explaining that our choice is motivated by real-time applicability (as emphasized in the manuscript), and we could consider performing or at least qualitatively discussing a test with a standard band-pass filter. If possible, we might also indicate expectations (e.g., FFT filtering could slightly improve accuracy but would not be usable in real time).
  Comment 6: Lack of information on computational costs / comparisons between GCM and ML. In the abstract, we briefly noted that ML methods are more computationally efficient than GCMs, but we did not quantify this. This point is valid: a simplified comparison (e.g., training time on our data vs. runtime of a GCM simulation) could be informative. We will add a qualitative discussion on this topic in the revision, specifying for instance that training an ML model (a few minutes/hours on a PC) is much faster than running a GCM simulation (multi-member ensembles on a supercomputer). At the very least, we will expand the text to make the abstract’s statement more explicit and cite references or benchmark figures on the order of magnitude of the costs.
  Comment 7: Figures are difficult to read, and the main text is overloaded with methodological details.We take note of this stylistic remark. We will improve the readability of the figures by enlarging fonts, simplifying captions, and revising color palettes or contrasts if necessary. Regarding the text, we will agree with the co-author to move some overly detailed explanations of the algorithms to the appendix or shorten them, in order to make the main version more fluid. We will also rephrase and condense certain methodological sections to lighten the manuscript without altering its substance.
  
  Citation: https://doi.org/10.5194/egusphere-2024-4040-AC3

Status: closed (peer review stopped)

RC1:
'Comment on egusphere-2024-4040', Anonymous Referee #1, 01 May 2025

This paper makes an attempt at exploring predictability of West African Monsoon rainfall over Senegal. Specifically, the authors have focused on weekly rainfall prediction, based on ocean and atmosphere-based predictors observed at different lead-periods (0 to 5 weeks). They have carried out correlational analysis of different predictors with the target variable (precipitation), and identified the most significant predictors. They have then compared a number of standard ML models/algorithms like LR, ridge regression, MLP and SVM and compared their predictive accuracies of weekly rainfall against those of operational numerical forecast models like NCEP, ECMWF, UKMO, and identify ridge regression as most successful.
While I appreciate the general aims and overall approach of the work, I think some improvements are necessary to make the work more sound:
1) The work focuses on weekly rainfall. Are we considering non-overlapping weeks (according to calendar) or overlapping weeks (7-day sliding windows)? I think the second one is a better approach. Also, in different regions, the variability is best captured by considering windows of different sizes. The authors may consider 5-day or 10-day windows to see if the results improve
2) The spatial maps of Fig 8-10 are good for visualization, but they are difficult to compare across different lead weeks. For example, in Fig 8, 9 etc the spatial maps for different weeks look almost the same. There should be some objective/numerical measure to compare them.
3) According to the tables in Fig 7, the predictability comes mainly from OLR. Other predictors seem to add very little or no value, as the combined impact of all predictors mirrors the individual impact of OLR for all lead times. Only SST and U200 at 5-weeks lead have some significant score in terms of correlation (not MAE). But as far as I know, OLR is a derived variable that is essentially a proxy for cloud cover. Hence, OLR's relation with precipitation is largely associational, and I doubt if it can be considered as a predictor in the "forecasting" sense. So, I feel the results of Fig 7 are rather weak.
4) The predictive models considered in this study, like SVM and ridge regression are outdated in current ML research due to their limited predictive power as they cannot represent the kind of complex functions that may arise in the natural sciences. While such models are still useful when we are predicting independently for each location, a better approach may be to jointly predict the precipitation for the entire region (all locations) jointly. For that, we may prefer to use deep neural networks like CNNs.

Citation: https://doi.org/10.5194/egusphere-2024-4040-RC1
- AC1: 'Reply on RC1', Dioumacor FAYE, 17 May 2025
  
  We sincerely thank you for your thorough review of our manuscript and for your insightful suggestions.
  
  1- Regarding your question, we confirm that the weeks considered in our study are indeed defined as sliding 7-day windows, rather than fixed calendar weeks. This approach was chosen to better capture the intra-seasonal variability of rainfall. Furthermore, we fully agree with your comment about exploring different window sizes (such as 5 or 10 days). This is indeed a very relevant suggestion, especially for capturing regional variability more effectively. We plan to consider this as a perspective for future work and potential methodological improvement.
  2- We acknowledge that these maps may appear visually similar across successive lead times, which can make interpretation challenging. To address this, we have performed regional averaging of the results as a function of forecast lead time, which is presented in the figures comparing machine learning models with S2S models (Figure 11). This approach was intended to provide a clearer and more quantitative comparison across different models and lead times.
  3- We confirm that OLR emerges as the most influential predictor in our results, which can be attributed to its strong statistical association with convection and rainfall.
  
  Although OLR is often considered a proxy for cloud cover, its relationship with precipitation is not merely associative. It reflects fundamental physical mechanisms, particularly in tropical regions. Low OLR values typically indicate the presence of cold, high cloud tops, characteristic of deep convective systems that produce intense rainfall. This connection is grounded in the thermodynamics of the tropical atmosphere, where deep convection is tightly linked to heavy precipitation. Moreover, in a machine learning framework, the dominance of a single predictor does not necessarily imply that others are irrelevant. Additional predictors may still contribute meaningfully through interactions or in specific spatiotemporal contexts. In the revised version of the manuscript, we plan to incorporate a SHAP (SHapley Additive exPlanations) analysis to more thoroughly assess the contextual contribution of each predictor and their interactions within the models.
  4- In this study, we deliberately began with simple and computationally inexpensive models, such as linear regression, ridge regression, Random Forest, and SVM, in order to establish an initial analytical framework that is interpretable, reproducible, and suited to the computational resources available to us.
  
  We are aware that these approaches have limitations when it comes to capturing the complexity of nonlinear relationships in high-dimensional environmental data. That is why we initially applied statistical computations to help reduce this complexity, as illustrated by Equations (1) to (4) in the manuscript. As part of our ongoing work, we have already started exploring deep learning approaches, particularly convolutional neural networks (CNNs), which provide a suitable framework for joint and spatial modeling of precipitation at the regional scale. Moreover, our results show that in this specific case, simple regression models (such as ridge and linear regression) have often outperformed more complex models (such as Random Forest or AdaBoost), which highlights the importance of aligning model complexity with the structure of the available data.
  
  We will include this discussion in the revised version of the manuscript and will continue evaluating more advanced models in our future research.
  
  Citation: https://doi.org/10.5194/egusphere-2024-4040-AC1
RC2:
'Comment on egusphere-2024-4040', Anonymous Referee #2, 10 Sep 2025
This manuscript presents an assessment of machine learning approaches for subseasonal to seasonal (S2S) precipitation forecasting in Senegal, comparing six ML algorithms against operational S2S models. While the research addresses an important problem in West African climate prediction, the study contains several fundamental methodological flaws that severely restrict its contribution to the field.
The absence of proper baseline comparisons, uninterpretable evaluation metrics, and limited methodological scope render the main claims unsupported by the evidence presented. However, the authors could potentially address these issues through major revisions for future submission.
Major Issues
Inadequate Evaluation Framework and Missing Baseline Comparisons

The most critical flaw in this study is the inadequate evaluation framework that makes it impossible to assess whether the ML models provide genuine forecasting skill. Several fundamental problems undermine the evaluation:
Meaningless Performance Metrics:
The reported MAE values (0.5-0.8) lack proper context and are essentially uninterpretable

Without knowledge of typical precipitation anomaly magnitudes and variability in Senegal, readers cannot assess whether these values represent good or poor performance

The absence of skill scores or normalized error measures prevents determination of practical forecast value

An MAE of 0.7 could represent excellent or terrible performance depending on the natural variability of the system

Missing Baseline Comparisons: The study fails to compare against fundamental baseline forecasts, including:
Climatological forecasts (seasonal mean precipitation)

Persistence models (assuming recent conditions continue)

Simple statistical models (e.g., regression against climate indices)

Random forecasts

Without these comparisons, there is no evidence that the ML models provide any skill beyond naive forecasting methods.
Limited Discriminatory Power: The fact that differences between linear regression and other methods are not statistically significant suggests that the power of machine learning has not been fully utilized. This indicates that simpler approaches would likely yield performance metrics in the same range as the reported results.
Methodological Weaknesses in Predictor Selection and Feature Engineering

Several aspects of the ML methodology raise significant concerns:
Predictor Analysis - Strengths and Weaknesses:
Strength: The detailed linear correlation analysis with important predictor variables for S2S forecasting (such as sea surface temperatures, atmospheric circulation patterns, and teleconnection indices) represents a thorough foundation for understanding climate relationships in the West African region

Weakness: However, over-reliance on linear correlation analysis may miss important non-linear climate relationships crucial for understanding monsoon dynamics

Critical Gap: While the correlation analysis is comprehensive, it remains unclear how features derived from these correlations are actually extracted and formatted as inputs for the ML models

The exact dimensionality and structure of inputs to ML models remains unclear

Ambiguity regarding temporal input structure: Are you using only Week 2 forecasts, or Weeks 2-5? What are the actual model inputs?

The transition from correlation patterns to ML model features lacks transparency

Insufficient Feature Engineering:
The covariance-based aggregation method needs better justification and comparison with alternative approaches

Lack of discussion on feature scaling, multicollinearity assessment, or advanced feature selection techniques

No exploration of techniques that could improve model performance

Validation Concerns:
While the non-filtering method for extracting intraseasonal signals is practical for real-time applications, its effectiveness compared to established bandpass filtering methods requires thorough validation

Missing computational details regarding processing time, computational requirements, and practical implementation considerations

Missing Technical Details
Insufficient discussion of computational requirements

Lack of processing time comparisons between ML and GCM approaches

Limited practical implementation considerations

Other Points:
Many aspects of well-known ML models are described unnecessarily in the main text

These methodological details could be moved to supplementary information

The main paper should focus more directly on results and novel contributions

The figures are difficult to read and interpret.

Overall, while this research addresses an important regional climate prediction challenge, the current manuscript requires substantial methodological improvements before it can make a meaningful contribution to the field.
Citation: https://doi.org/10.5194/egusphere-2024-4040-RC2
- AC1: 'Reply on RC1', Dioumacor FAYE, 17 May 2025
  
  We sincerely thank you for your thorough review of our manuscript and for your insightful suggestions.
  
  1- Regarding your question, we confirm that the weeks considered in our study are indeed defined as sliding 7-day windows, rather than fixed calendar weeks. This approach was chosen to better capture the intra-seasonal variability of rainfall. Furthermore, we fully agree with your comment about exploring different window sizes (such as 5 or 10 days). This is indeed a very relevant suggestion, especially for capturing regional variability more effectively. We plan to consider this as a perspective for future work and potential methodological improvement.
  2- We acknowledge that these maps may appear visually similar across successive lead times, which can make interpretation challenging. To address this, we have performed regional averaging of the results as a function of forecast lead time, which is presented in the figures comparing machine learning models with S2S models (Figure 11). This approach was intended to provide a clearer and more quantitative comparison across different models and lead times.
  3- We confirm that OLR emerges as the most influential predictor in our results, which can be attributed to its strong statistical association with convection and rainfall.
  
  Although OLR is often considered a proxy for cloud cover, its relationship with precipitation is not merely associative. It reflects fundamental physical mechanisms, particularly in tropical regions. Low OLR values typically indicate the presence of cold, high cloud tops, characteristic of deep convective systems that produce intense rainfall. This connection is grounded in the thermodynamics of the tropical atmosphere, where deep convection is tightly linked to heavy precipitation. Moreover, in a machine learning framework, the dominance of a single predictor does not necessarily imply that others are irrelevant. Additional predictors may still contribute meaningfully through interactions or in specific spatiotemporal contexts. In the revised version of the manuscript, we plan to incorporate a SHAP (SHapley Additive exPlanations) analysis to more thoroughly assess the contextual contribution of each predictor and their interactions within the models.
  4- In this study, we deliberately began with simple and computationally inexpensive models, such as linear regression, ridge regression, Random Forest, and SVM, in order to establish an initial analytical framework that is interpretable, reproducible, and suited to the computational resources available to us.
  
  We are aware that these approaches have limitations when it comes to capturing the complexity of nonlinear relationships in high-dimensional environmental data. That is why we initially applied statistical computations to help reduce this complexity, as illustrated by Equations (1) to (4) in the manuscript. As part of our ongoing work, we have already started exploring deep learning approaches, particularly convolutional neural networks (CNNs), which provide a suitable framework for joint and spatial modeling of precipitation at the regional scale. Moreover, our results show that in this specific case, simple regression models (such as ridge and linear regression) have often outperformed more complex models (such as Random Forest or AdaBoost), which highlights the importance of aligning model complexity with the structure of the available data.
  
  We will include this discussion in the revised version of the manuscript and will continue evaluating more advanced models in our future research.
  
  Citation: https://doi.org/10.5194/egusphere-2024-4040-AC1
- AC2: 'Reply on RC2', Dioumacor FAYE, 12 Sep 2025
  
  Comment 1: MAE not interpretable without context on natural variability.
  
  The manuscript defines the MAE in Section 2.2.4 (Eq. 8) and uses it to evaluate performance (Figure 6). In order to contextualize the predictive performance of the models and provide essential reference points, we calculated the standard deviation of the observed precipitation anomalies and normalized the error metrics to make them interpretable. Furthermore, we introduced a Skill Score (SS) based on MAE, defined as SS = 1 - (MAE_model / MAE_reference), using climatology as a benchmark.
  Comment 2: Little difference between linear regression and more complex methods (RF, MLP, etc.).
  
  Figure 6 (Section 3.2) indeed shows that Ridge regression and simple linear regression achieve the best scores, while more complex methods such as RF, SVM, or MLP display slightly lower performance. This finding, already highlighted in the manuscript, indicates that linear models (with regularization) dominate in our case.
  
  We nevertheless acknowledge that it is relevant to discuss the possible reasons for this limited gap. Several explanations can be put forward: on the one hand, the key atmospheric signals may be largely linear in nature.
  
  In addition, we emphasize that we applied statistical calculations beforehand to reduce data complexity, as illustrated by Equations (1)–(4) in the manuscript. This step likely contributes to explaining why linear models such as Ridge regression and simple regression perform better, while more sophisticated non-linear models do not provide significant additional gains in this context.
  
  We will expand on these points in the revised version to strengthen the discussion, without altering the presented results.
  Comment 3: Excessive use of linear correlations; lack of clarity on the structure of predictors and input data. We appreciate this remark. We have clarified that the correlation analysis conducted in Sect. 2.2.2 primarily serves an exploratory and illustrative purpose (Figs. 2–4), and that the final construction of predictors does not rely solely on these correlations. Following Li et al. (2022), we use a method based on spatio-temporal covariance. We have added a more detailed explanation to the manuscript (see Eqs. 4–5 in Sect. 2.2.2): each predictor 𝑋𝑘 is defined as the sum of the “10–60 j” signals weighted by their covariance with precipitation in Senegal. Formally, in Equation (5), the sum is taken only over the grid points 𝑖 where the correlation is significant at the 5% level. This procedure ensures that only signals genuinely related to Senegalese precipitation (and not random “noise”) are aggregated, reducing uncertainty in the choice of predictors. We have clarified this mechanism and cited Li et al. (2022) to justify it, as well as the use of the statistical threshold. Section 2.2.3 already mentions that “a single predictor” is defined for each field and each previous week; we will further strengthen this explanation to clarify the dimension and structure of the input dataset.
  Comment 4: Weaknesses in feature engineering: no information on scaling, multicollinearity, or selection techniques.
  
  The current version does not explicitly specify the normalization of the input variables. We will address this by indicating that the ISO signals of atmospheric fields and precipitation were normalized using the Yeo and Johnson (2000) method, which ensures a more suitable distribution and a comparable scale across predictors. Regarding collinearity, Section 2.2.3 mentions that models such as Ridge regression are specifically used to handle multicollinearity among predictors. We will emphasize this point and add a sentence to clarify that, in addition, a significance-based filtering (correlations at the 5% level) is applied during signal selection. Finally, we will provide more detail on our predictor selection method (e.g., statistical threshold or other criteria), which is already implemented through the correlation analysis described in Sect. 2.2.2, in order to fully address this comment.
  Comment 5: Unfiltered validation to be compared with classical filtering methods (FFT, band-pass).
  
  The manuscript clearly describes our “non-filtering” approach in Section 2.2.1 (see excerpt below): we avoid traditional FFT filtering (which requires future information) to extract the 10–60 day signals, as explained by Li et al. (2022). We did not carry out a quantitative comparison with classical filters in this study. However, this suggestion is relevant. In the revised version, we will add a note explaining that our choice is motivated by real-time applicability (as emphasized in the manuscript), and we could consider performing or at least qualitatively discussing a test with a standard band-pass filter. If possible, we might also indicate expectations (e.g., FFT filtering could slightly improve accuracy but would not be usable in real time).
  Comment 6: Lack of information on computational costs / comparisons between GCM and ML.
  
  In the abstract, we briefly noted that ML methods are more computationally efficient than GCMs, but we did not quantify this. This point is valid: a simplified comparison (e.g., training time on our data vs. runtime of a GCM simulation) could be informative. We will add a qualitative discussion on this topic in the revision, specifying for instance that training an ML model (a few minutes/hours on a PC) is much faster than running a GCM simulation (multi-member ensembles on a supercomputer). At the very least, we will expand the text to make the abstract’s statement more explicit and cite references or benchmark figures on the order of magnitude of the costs.
  Comment 7: Figures are difficult to read, and the main text is overloaded with methodological details.
  
  We take note of this stylistic remark. We will improve the readability of the figures by enlarging fonts, simplifying captions, and revising color palettes or contrasts if necessary. Regarding the text, we will agree with the co-author to move some overly detailed explanations of the algorithms to the appendix or shorten them, in order to make the main version more fluid. We will also rephrase and condense certain methodological sections to lighten the manuscript without altering its substance.
  
  Citation: https://doi.org/10.5194/egusphere-2024-4040-AC2
- AC3: 'Reply on RC2', Dioumacor FAYE, 12 Sep 2025
  
  We sincerely thank you for your thorough review of our manuscript and for your insightful suggestions.
  Comment 1: MAE not interpretable without context on natural variability. The manuscript defines the MAE in Section 2.2.4 (Eq. 8) and uses it to evaluate performance (Figure 6). In order to contextualize the predictive performance of the models and provide essential reference points, we calculated the standard deviation of the observed precipitation anomalies and normalized the error metrics to make them interpretable. Furthermore, we introduced a Skill Score (SS) based on MAE, defined as SS = 1 - (MAE_model / MAE_reference), using climatology as a benchmark.
  Comment 2: Little difference between linear regression and more complex methods (RF, MLP, etc.). Figure 6 (Section 3.2) indeed shows that Ridge regression and simple linear regression achieve the best scores, while more complex methods such as RF, SVM, or MLP display slightly lower performance. This finding, already highlighted in the manuscript, indicates that linear models (with regularization) dominate in our case. We nevertheless acknowledge that it is relevant to discuss the possible reasons for this limited gap. Several explanations can be put forward: on the one hand, the key atmospheric signals may be largely linear in nature. In addition, we emphasize that we applied statistical calculations beforehand to reduce data complexity, as illustrated by Equations (1)–(4) in the manuscript. This step likely contributes to explaining why linear models such as Ridge regression and simple regression perform better, while more sophisticated non-linear models do not provide significant additional gains in this context. We will expand on these points in the revised version to strengthen the discussion, without altering the presented results.
  Comment 3: Excessive use of linear correlations; lack of clarity on the structure of predictors and input data. We appreciate this remark. We have clarified that the correlation analysis conducted in Sect. 2.2.2 primarily serves an exploratory and illustrative purpose (Figs. 2–4), and that the final construction of predictors does not rely solely on these correlations. Following Li et al. (2022), we use a method based on spatio-temporal covariance. We have added a more detailed explanation to the manuscript (see Eqs. 4–5 in Sect. 2.2.2): each predictor 𝑋_𝑘 is defined as the sum of the 10–60 j signals weighted by their covariance with precipitation in Senegal. Formally, in Equation (5), the sum is taken only over the grid points 𝑖 where the correlation is significant at the 5% level. This procedure ensures that only signals genuinely related to Senegalese precipitation (and not random “noise”) are aggregated, reducing uncertainty in the choice of predictors. We have clarified this mechanism and cited Li et al. (2022) to justify it, as well as the use of the statistical threshold. Section 2.2.3 already mentions that “a single predictor” is defined for each field and each previous week; we will further strengthen this explanation to clarify the dimension and structure of the input dataset.
  Comment 4: Weaknesses in feature engineering: no information on scaling, multicollinearity, or selection techniques. The current version does not explicitly specify the normalization of the input variables. We will address this by indicating that the ISO signals of atmospheric fields and precipitation were normalized using the Yeo and Johnson (2000) method, which ensures a more suitable distribution and a comparable scale across predictors. Regarding collinearity, Section 2.2.3 mentions that models such as Ridge regression are specifically used to handle multicollinearity among predictors. We will emphasize this point and add a sentence to clarify that, in addition, a significance-based filtering (correlations at the 5% level) is applied during signal selection. Finally, we will provide more detail on our predictor selection method (e.g., statistical threshold or other criteria), which is already implemented through the correlation analysis described in Sect. 2.2.2, in order to fully address this comment.
  Comment 5: Unfiltered validation to be compared with classical filtering methods (FFT, band-pass). The manuscript clearly describes our “non-filtering” approach in Section 2.2.1 (see excerpt below): we avoid traditional FFT filtering (which requires future information) to extract the 10–60 day signals, as explained by Li et al. (2022). We did not carry out a quantitative comparison with classical filters in this study. However, this suggestion is relevant. In the revised version, we will add a note explaining that our choice is motivated by real-time applicability (as emphasized in the manuscript), and we could consider performing or at least qualitatively discussing a test with a standard band-pass filter. If possible, we might also indicate expectations (e.g., FFT filtering could slightly improve accuracy but would not be usable in real time).
  Comment 6: Lack of information on computational costs / comparisons between GCM and ML. In the abstract, we briefly noted that ML methods are more computationally efficient than GCMs, but we did not quantify this. This point is valid: a simplified comparison (e.g., training time on our data vs. runtime of a GCM simulation) could be informative. We will add a qualitative discussion on this topic in the revision, specifying for instance that training an ML model (a few minutes/hours on a PC) is much faster than running a GCM simulation (multi-member ensembles on a supercomputer). At the very least, we will expand the text to make the abstract’s statement more explicit and cite references or benchmark figures on the order of magnitude of the costs.
  Comment 7: Figures are difficult to read, and the main text is overloaded with methodological details.We take note of this stylistic remark. We will improve the readability of the figures by enlarging fonts, simplifying captions, and revising color palettes or contrasts if necessary. Regarding the text, we will agree with the co-author to move some overly detailed explanations of the algorithms to the appendix or shorten them, in order to make the main version more fluid. We will also rephrase and condense certain methodological sections to lighten the manuscript without altering its substance.
  
  Citation: https://doi.org/10.5194/egusphere-2024-4040-AC3

Dioumacor Faye, Felipe M. de Andrade, Roberto Suárez-Moreno, Dahirou Wane, Michaela I. Hegglin, Abdou L. Dieng, François Kaly, Redouane Lguensat, and Amadou T. Gaye

Supplement

https://doi.org/10.5194/egusphere-2024-4040-supplement

Dioumacor Faye, Felipe M. de Andrade, Roberto Suárez-Moreno, Dahirou Wane, Michaela I. Hegglin, Abdou L. Dieng, François Kaly, Redouane Lguensat, and Amadou T. Gaye

Viewed

Total article views: 565 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
443	101	21	565	20	17	30

HTML: 443
PDF: 101
XML: 21
Total: 565
Supplement: 20
BibTeX: 17
EndNote: 30

Views and downloads (calculated since 28 Feb 2025)

Month	HTML	PDF	XML	Total
Feb 2025	29	1	0	30
Mar 2025	113	31	3	147
Apr 2025	32	5	2	39
May 2025	64	9	5	78
Jun 2025	35	10	6	51
Jul 2025	34	6	0	40
Aug 2025	37	10	1	48
Sep 2025	46	13	2	61
Oct 2025	42	13	1	56
Nov 2025	11	3	1	15

Cumulative views and downloads (calculated since 28 Feb 2025)

Month	HTML	PDF	XML	Total
Feb 2025	29	1	0	30
Mar 2025	113	31	3	147
Apr 2025	32	5	2	39
May 2025	64	9	5	78
Jun 2025	35	10	6	51
Jul 2025	34	6	0	40
Aug 2025	37	10	1	48
Sep 2025	46	13	2	61
Oct 2025	42	13	1	56
Nov 2025	11	3	1	15

Viewed (geographical distribution)

Total article views: 586 (including HTML, PDF, and XML) Thereof 586 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 08 Nov 2025

Short summary

This study evaluates machine learning (ML) methods to improve subseasonal-to-seasonal (S2S) rainfall forecasts in Senegal during the West African monsoon. Using high-resolution precipitation data and atmospheric-oceanic reanalysis, we show that ML models like ridge regression outperform traditional climate models. These methods enhance prediction accuracy and efficiency, offering valuable tools for climate risk management and water resource planning.


Total:	0
HTML:	0
PDF:	0
XML:	0