Improving the prediction of the Madden-Julian Oscillation of the ECMWF model by post-processing
- 1Departament de Física, Universitat Politècnica de Catalunya, Sant Nebridi 22, 08222 Terrassa, Barcelona, Spain
- 2Institute of Economics, Karlsruhe Institute of Technology, Blücherstr. 17, 76185 Karlsruhe, Germany
- 3European Centre for Medium-Range Weather Forecast (ECMWF), Reading, UK
- 4Technische Universität Bergakademie Freiberg (TUBAF), Freiberg, Germany
- 5Max-Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- 6Departamento de Ciencias de la Atmósfera, Facultad de Ciencias, Universidad de la República, Igua 4225, 11400 Montevideo, Uruguay
- 1Departament de Física, Universitat Politècnica de Catalunya, Sant Nebridi 22, 08222 Terrassa, Barcelona, Spain
- 2Institute of Economics, Karlsruhe Institute of Technology, Blücherstr. 17, 76185 Karlsruhe, Germany
- 3European Centre for Medium-Range Weather Forecast (ECMWF), Reading, UK
- 4Technische Universität Bergakademie Freiberg (TUBAF), Freiberg, Germany
- 5Max-Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- 6Departamento de Ciencias de la Atmósfera, Facultad de Ciencias, Universidad de la República, Igua 4225, 11400 Montevideo, Uruguay
Abstract. The Madden-Julian Oscillation (MJO) is a major source of predictability on the sub-seasonal (10- to 90-days) time scale. An improved forecast of the MJO, may have important socioeconomic impacts due to the influence of MJO on both, tropical and extratropical weather extremes. Although in the last decades state-of-the-art climate models have proved their capability for forecasting the MJO exceeding the 5 weeks prediction skill, there is still room for improving the prediction. In this study we use Multiple Linear Regression (MLR) and a Machine Learning (ML) algorithm as post-processing methods to improve the forecast of the model that currently holds the best MJO forecasting performance, the European Centre for Medium-Range Weather Forecast (ECMWF) model. We find that both MLR and ML improve the MJO prediction and that ML outperforms MLR. The largest improvement is in the prediction of the MJO geographical location and intensity.
Riccardo Silini et al.
Status: open (until 06 May 2022)
-
RC1: 'Comment on egusphere-2022-2', Anonymous Referee #1, 21 Mar 2022
reply
General Comments
This paper presents use of Multiple Linear Regression (MLR) and a Machine Learning (ML) algorithm as post-processing methods to improve MJO forecast of the European Centre for ECMWF model. It is generally well written and showcases successful results to improve MJO forecasts. Still, manuscript needs improvement in descrbing technical implementation that is somewhat short and confusing in places, such as relation between input and output neurons and lead time, as well as on MLR implementation.
Specific Comments
- Line 105: “After selecting the number of output neurons (which is even and in fact defines our lead time, τ = Nh/2)” – shouldn’t be Nout instead of Nh?
- Line 110: It appears to me that for each lead time L (1<L<46), ML takes as input the predicted ECMF trajectory RMM1,2 up to day L, and as output RMM1,2 in ERA5 observations up to day L+3 – please elaborate and clarify by confirming or correcting as necessary. Also, what is done when L=44,45 and 46?
- Section 2.5: implementation of MLR is barely described at all, please expand, scuh as do you use regularization to avoid overfitting, etc...
- Line 115: Please explain what a “walk-forward validation” is.
-
AC1: 'Reply on RC1', Riccardo Silini, 22 Mar 2022
reply
We thank the reviewer for a careful revision of our manuscript that has allowed us to improve our work. With respect to the comments of the reviewer:
- Line 105: “After selecting the number of output neurons (which is even and in fact defines our lead time, τ = Nh/2)” – shouldn’t be Nout instead of Nh?
Authors’ response: We thank the reviewer for noticing this typo, Yes indeed, tau = Nout/2
- Line 110: It appears to me that for each lead time L (1<L<46), ML takes as input the predicted ECMF trajectory RMM1,2 up to day L, and as output RMM1,2 in ERA5 observations up to day L+3 – please elaborate and clarify by confirming or correcting as necessary. Also, what is done when L=44,45 and 46?
Authors’ response: We have revised the manuscript to clarify this point. The number of inputs is generally larger than the number of outputs, since we use the information we have of future ECMWF-predicted RMMs, as input. So, for each lead time L(1 < L < 46), we will have (L+3)*2 inputs, or, Nin = Nout + 6 (line 110). if L is 44-46, Nin will be equal to Nout, since we don’t have access to the future values (line 110: Nin = Nout + 6 with an upper limit of 92 inputs). For lead times longer than 30 - 35 days, the prediction skill becomes poor (COR and RMSE already crossed the 0.5 and 1.4 thresholds), and thus, the last lead times (44-46 days) are not crucial.
- Section 2.5: implementation of MLR is barely described at all, please expand, such as do you use regularization to avoid overfitting, etc...
Authors’ response: We have revised the manuscript to clarify this point. We do not include a regularization term like in Ridge or Lasso. MLR is the ordinary least squares (OLS) linear regression. This choice is also due to consistency with Kim et al. 2021.
- Line 115: Please explain what a “walk-forward validation” is.
Authors’ response: We have revised the manuscript to clarify what “walk-forward validation” is. The procedure is as follows. First, we train the network on an expanding train set, and then test its performance on a validation set that contains the N samples that follow the train set. In our case, we found the best minimum number of samples for the train set, out of 2200 available, to be 1700. Then, the train set is extended by 100 samples (∼ 1 year) for each run, and validated on the subsequent 200 samples (∼ 2 years). This method of walk-forward validation ensures that no information coming from the future of the test set is used to train the model.
Riccardo Silini et al.
Riccardo Silini et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
12 | 0 | 0 | 12 | 0 | 0 |
- HTML: 12
- PDF: 0
- XML: 0
- Total: 12
- BibTeX: 0
- EndNote: 0
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1