Austral Summer MJO Forecast Skill in S2S Models: Decadal Shifts and Their Drivers

Roy, Raina; Arblaster, Julie M.; Wheeler, Matthew C.; Lim, Eun-Pa; Richter, Jadwiga H.

doi:10.5194/egusphere-2025-4453

Preprints

https://doi.org/10.5194/egusphere-2025-4453

Preprints

02 Oct 2025

| 02 Oct 2025

Austral Summer MJO Forecast Skill in S2S Models: Decadal Shifts and Their Drivers

Raina Roy, Julie M. Arblaster, Matthew C. Wheeler, Eun-Pa Lim, and Jadwiga H. Richter

Abstract. The Madden–Julian Oscillation (MJO) is a key driver of global subseasonal-to-seasonal (S2S) climate variability, influencing tropical convection and initiating teleconnections that affect weather patterns worldwide. Improving understanding of the factors that constrain MJO predictability is therefore critical for advancing S2S forecasting systems. Using a multi-model framework, we evaluate changes in MJO prediction skill between two periods (1981–1998 and 1999–2018) during austral summer (December–February) and examine the processes underpinning these differences. Our analysis reveals a pronounced decadal decline in MJO forecast skill, with high-skill years in 1981–1998 showing prediction lead times of around 10 days longer (based on the bivariate correlation of the RMM index) than in 1999–2018, while low-skill years show little change. This asymmetric reduction coincides with stronger MJO amplitude in the earlier period, despite relatively stable model mean-state biases in tropical SSTs and lower-tropospheric moisture. Key findings include: (1) persistent moisture biases across both periods, yet higher skill in 1981–1998, suggesting that model errors alone cannot explain the differences; (2) a stronger Quasi-Biennial Oscillation (QBO)–MJO relationship in the first period, independent of stratospheric resolution; and (3) weakened coupling between the MJO and large-scale climate modes, including the QBO, El Niño–Southern Oscillation (ENSO), and Indian Ocean Dipole (IOD), in 1999–2018, indicating reduced dynamical support for prediction. These results suggest that decadal variations in MJO skill are strongly influenced by changes in the background dynamical environment. They highlight the need for S2S systems to improve representation of tropospheric processes and stratosphere–troposphere coupling, particularly when large-scale climate forcing is weak.

Received: 11 Sep 2025 – Discussion started: 02 Oct 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 2265 KB)

Supplement (358 KB)

Download & links

Raina Roy, Julie M. Arblaster, Matthew C. Wheeler, Eun-Pa Lim, and Jadwiga H. Richter

Status: final response (author comments only)

RC1:
'Comment on egusphere-2025-4453', Anonymous Referee #1, 29 Oct 2025
The article evaluates the MJO predictive skill between two periods (1981-1998 and 1999-2018) using several S2S forecasting systems. The authors found that the MJO predictive skill was smaller in the latest period, particularly during high skill years. They relate this change of predictive skill to changes in the background dynamical environment.

This article addresses the important topic of the interdecadal variability of MJO predictability and predictive skill. A better knowledge of this variability might help identify model current limitations in the prediction of the Madden Julian Oscillation. However, the presentation of this article needs to be improved. The text is not always clear. Some results seem contradictory. Therefore, I recommend major revisions.

Major comments:
A major limitation of this study is the too short periods considered. 20 years is likely to not be enough to investigate the link between MJO and ENSO, QBO or IOD. This is mentioned as a caveat at the end of the article but should be further discussed. A possible explanation why the 2 periods display such different relationships between background state and MJO characteristics might be that 20 years is not enough. For instance, according to Figure 1, the number of years with positive IOD in the period 1981-1998 is only 2! It is not guaranteed that such small sample is representative of the general population. If it is not, then the bootstrap resampling method is likely to be overconfident. Figure 1 should show error bars computed over each population. I am expecting these error bars to be huge, at least for some of the climate indices. Sub-sampling between high score and low scores makes the sample even smaller. Bootstrap resampling might not be accurate over such a tiny sample. It would be interesting to compare some of the results (e.g. amplitude composites of Figure 1) with the results obtained over the 40-year period.

This article contains several contradictions, missing information.

Other comments:

Line 254: anomalies relative to what climate? Is it the same climate for Period 1 and 2 (e.g. 40-year climate) or is the climate of period 1 (2) used to compute anomalies in period 1 (2)? If that’s the case, do you remove the scoring year in the climate calculation?

Line 284: What period was used to train the VAR? Is the training period independent from the verification periods (1981-1998 or 1999-2018)?

Abstract, line 119: “A stronger QBO-MJO relationship in the first period.: contradicts Figure 1 which shows a statistically significant and larger QBO-MJO amplitude relationship in the second period.

Lines 337-338: “2) easterly quasi-Biennial Oscillation (EQBO)”. This statement contradicts Figure 1 which shows statistical significance for QBO only in period 2 (1999-2018). Not in period 1 (1981-1998).

Line 366-367: What happens if your resampled years do not contain any year with the targeted climate index phase (e.g. negative IOD)?

Line 371: it would be useful to add a table with these correlations. How do these correlations compare with the correlations obtained over 40 years?

Line 384; “cold phase of the IOBM”. Shouldn’t it be warm phase instead (and SST warming)?

Line 400: “phase 3-6”. Legend of Figure S1 indicates phases 4-7, while caption indicates Phase 3-6. Which one is correct?

Line 407: “weakened post 1998”: to what value?

Line 410: “linking El Nino (La Nina) to increased (decreased) MJO activity that weakened substantially thereafter.” Figure S1 suggests a modulation of MJO (total of all phases) phase count by La-Nina stronger in 1999-2008 than in 1981-1998 but of opposite sign.

Line 412-414: This could be easily checked by counting and comparing the number of days with all the favourable phases of the indices co-occurring in both periods.

Line 417: “and more substantial phase-specific enhancement “. Figure S1 shows greater difference between the phases 4-7 and 8-3 in Period 2 (1999-2018) than Period 1 (1981-1998) for EQBO.

Line 421-422: I don’t agree with this statement. Figure S1 shows also a reversal for EQBO (between phases and also between EQBO and WQBO) suggesting that the QBO impact on MJO might not be that robust, or at least more robust than for tropospheric indices.

Line 428: “Dynamical models show strong inter-model agreements.” What is the corelation between the interannual variability of MJO skill between the different models, over the 2 periods. It would also be interesting to compute the correlation with VAR.

Fig2 A-B: If there is a significant different of MJO skill between the periods, this should be visible as a trend when considering the full 40-year period 1981-2018. Is there a significant trend when considering time series in Fig1 A and B together?

Figs C-D: These figures are hard to read. First the meaning of solid vs dashed lines is not explained in the figure’s caption. Secondly, the differences between the solid (early period) and dashed lines (later period) are difficult to see since lines of different colors are superimposed. I would suggest plotting the difference between earlier and later period, with error bars, in addition to these panels or as a replacement.

Line 498: There might still be a relationship but not a dominant one.

Lines 518-519: “30-day lead OLR and 850 hPa specific humidity”: is the lead time day 30 or the average from day 0 to 30 as indicated in Figure 4? If it is day 0-30, why is it different from the focus of the paper which is day 15-25 (line 307)?

Figure 4: This figure might need more explanation, particularly the difference between left and right panels. I suppose that the left panel is OBS OLR or humidity regression against MJO skill score, while the right panel shows the regression with model prediction of OLR and humidity. The discussion of this figure in the text is very confusing and needs to be clarified because it is not clear which of these panels is discussed. Stippling marks regions are difficult to see.

Line 561-562: Has this consistent background been observed in the second period and what was the impact on the MJO?

Line 580: multi-model mean: was the multi-model mean computed from the same models for the 2 periods (in which case only POAMA and ACCESS-S2) or does it differ in both periods to include all the model available in each period. If that’s the later, the comparison of multi-model skill between both periods is not valid, since the multi-model composition is different.

Lines 598-599: how often did these co-occurring climate conditions happened during these 2 periods.? 20-years is already probably too small to assess the impact of once climate mode on MJO predictive skill, but the combination of 4 different climate modes over 20 years is likely to produce only a tiny sample.

Line 630: “observed climate indices”: why not using model predicted climate indices? The model MJO activity and observed climate indices might become inconsistent after a couple of weeks. Would this table be different if the correlation was against model indices?

Figure 6: Colors are too dark making it difficult to read the numbers.

Line 680: “CESM2 performs distinctly worse than other models” GEOS-S2S-2 correlation is just 0.01 higher. This small difference of skill with CESM2 (0.70 vs 0.69) is unlikely to be statistically significant.

Table 2: What is the lead time?

Line 715: ”minimal difference”: I thought that Figure 2 showed a relatively large difference of skill between the 2 periods for ACCESS-S2 and POAMA2?
Citation: https://doi.org/10.5194/egusphere-2025-4453-RC1
- AC2: 'Reply on RC1', Raina Roy, 18 Dec 2025
  
  We sincerely thank Reviewer #1 for the thorough and constructive review and for the detailed, actionable comments. Please see the attached file for our responses to all comments.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4453-AC2
RC2:
'Comment on egusphere-2025-4453', Anonymous Referee #2, 01 Nov 2025

This study is evaluating the relationship between MJO and modes of climate variability in the tropics such as ENSO, IOD, IOBM, and stratospheric QBO during two periods: 1981-1998 and 1999-2018. The analysis indicates different relationships between the MJO and the other modes of variability from one period to another. Assuming that climate modes provide a source of predictability for the MJO, the second objective of the study is to test if models show a change in the forecast skill of the MJO for the two periods. While the first part is robust, the approach chosen for the second part has limitations because the POAMA2 model has a poor representation of stratospheric dynamics and the CESM2 and GEOS-S2S-2 models do not have data for the first period. There are other concerns that can be addressed and they are listed below.
L236: The three models used in the study (ACCESS-S2, CESM2 and GEOS-S2S-2) are not part of what is known in the community as the S2S data base: https://apps.ecmwf.int/datasets/data/s2s/levtype=sfc/type=cf/
Section 3.1: Please provide a table showing which years have been used for each of the phases of the climate modes shown in Fig. 1. The table can be in the supplement file.
L238: Please discuss the source of initial conditions used for POAMA2 and ACCESS-S2. If they use the same initial conditions the difference in skill will be solely due to models’ differences. If there was any change in the DA system used to generate the initial conditions between the two periods, that should also be discussed.
L389-394 and L419-422: Coincidently, the two periods considered in the study correspond to two phases of the Pacific Decadal Oscillation (PDO). 1981-1998 is mostly dominated by positive values of the PDO index whereas the 1999-2018 is dominated by negative values of the PDO index. This is also another factor affecting the mean state and should be mentioned when describing the shift in the background state.
L396: Please explain how ‘the mean DJF duration and total yearly event count for DJF’ are calculated.
L399-400: Figure S1 shows the phases grouped as 4, 5, 6, 7 and 8, 1, 2, 3. One cannot see that ‘the MJO spends more days in phases 3–6.’ If this is the message that the figure is intended to convey, they the grouping of phases should be 3,4,5,6 and 7,8,1,2.
L405-407: If the negative correlation is explained by the enhanced frequency of N-IOD years the reversal of sign means an increase frequency of positive IOD years? Weakening means a lover value of r?
L405-422, L579-590: I suggest summarizing all correlation coefficient values into a table.
Figure 2: Panel B ‘S2’ should be ACCESS-S2
L512-514: These results should be connected to the findings of Jiang et al. (2015, https://doi.org/10.1002/2014JD022375). They also show that feedbacks between moist convection and circulation are critical for simulation of the MJO.
Please explain the interpretation of regression analysis. The idea of identifying patterns associated with high/low MJO skill depends on how the low skill is defined. For example, if the correlation coefficient has a large negative value, the skill is low, but the regression coefficient will have a large value. Second, what is the reason for regressing observations onto the model skill? And lastly, the regression coefficients in Fig. 4 show very limited statistical significance, which raise the question of how robust this analysis is.
The usage of POAMA2 model for the evaluation of MJO-QBO relationship raises some questions about this model ability to resolve the stratosphere more than what is acknowledged in the study. The model top is located at 10 hPa, meaning that the model does not have a full stratosphere. Compared to the QBO lifecycle, these forecasts are relatively short, and the model might be tuned to have a ‘good QBO’ but miss the QBO dynamics.
Fig. 6 Caption: Please explain why some boxes are filled with color. On the x-axis please use font of different colors for denoting the two periods and draw a thick vertical line between the left side of the plot with event count and the right side of the plot with event duration.

Citation: https://doi.org/10.5194/egusphere-2025-4453-RC2
- AC1: 'Reply on RC2', Raina Roy, 18 Dec 2025
  
  We thank Reviewer 2 for their detailed feedback on our study. We acknowledge the critical limitations regarding model configuration and data availability and address each concern below. Please see the attached file for our responses to all comments.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4453-AC1
AC3: 'Comment on egusphere-2025-4453', Raina Roy, 18 Dec 2025

We thank the editor and the reviewers for their time and constructive comments on our manuscript, 'Austral Summer MJO Forecast Skill in S2S Models: Decadal Shifts and Their Drivers'. We have carefully considered the suggestions, particularly regarding limitations on sample size, clarifying the QBO-MJO relationship, and the influence of the PDO. We believe the major revisions detailed below will have significantly strengthened the paper’s robustness and clarity.

Citation: https://doi.org/10.5194/egusphere-2025-4453-AC3

Raina Roy, Julie M. Arblaster, Matthew C. Wheeler, Eun-Pa Lim, and Jadwiga H. Richter

Supplement

https://doi.org/10.5194/egusphere-2025-4453-supplement

Raina Roy, Julie M. Arblaster, Matthew C. Wheeler, Eun-Pa Lim, and Jadwiga H. Richter

Viewed

Total article views: 467 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
347	88	32	467	42	25	24

HTML: 347
PDF: 88
XML: 32
Total: 467
Supplement: 42
BibTeX: 25
EndNote: 24

Views and downloads (calculated since 02 Oct 2025)

Month	HTML	PDF	XML	Total
Oct 2025	115	29	12	156
Nov 2025	114	19	7	140
Dec 2025	118	40	13	171

Cumulative views and downloads (calculated since 02 Oct 2025)

Month	HTML	PDF	XML	Total
Oct 2025	115	29	12	156
Nov 2025	114	19	7	140
Dec 2025	118	40	13	171

Viewed (geographical distribution)

Total article views: 446 (including HTML, PDF, and XML) Thereof 446 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 22 Dec 2025

Short summary

A key pattern of tropical weather, the Madden-Julian Oscillation, has become significantly harder to predict since the late 1990s. We discovered this by comparing forecasts from major models across two time periods. The decrease in forecast skill is linked to changes in large-scale climate patterns, not just model errors. This means to improve long-range weather forecasts, models must better simulate how these large-scale patterns interact with tropical weather.


Total:	0
HTML:	0
PDF:	0
XML:	0