the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A process-based evaluation of biases in extratropical stratosphere-troposphere coupling in subseasonal forecast systems
Abstract. Two-way coupling between the stratosphere and troposphere is recognized as an important source of subseasonal-to-seasonal (S2S) predictability and can open windows of opportunity for improved forecasts. Model biases can, however, lead to a poor representation of such coupling processes; drifts in a model’s circulation related to model biases, resolution, and parameterizations have the potential to feed back on the circulation and affect stratosphere-troposphere coupling.
In the Northern Hemisphere, nearly all S2S forecast systems underestimate the strength of the observed upward coupling from the troposphere to the stratosphere, downward coupling within the stratosphere, and the persistence of lower stratospheric temperature anomalies. While downward coupling from the lower stratosphere to the near surface is well represented in the multi-model ensemble mean, there is substantial inter-model spread likely related to how well each model represents tropospheric stationary waves.
In the Southern Hemisphere, the stratospheric vortex is over-sensitive to upward propagating wave flux in the forecast systems. Forecast systems generally overestimate the strength of downward coupling from the lower stratosphere to the troposphere, even as they underestimate the radiative persistence in the lower stratosphere. In both hemispheres, models with higher lids and a better representation of tropospheric quasi-stationary waves generally perform better at simulating these coupling processes.
- Preprint
(2855 KB) - Metadata XML
-
Supplement
(1387 KB) - BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on egusphere-2024-1762', Anonymous Referee #1, 22 Jul 2024
This paper discusses the ability of long range forecast models to represent troposphere stratosphere coupling. The results come from a set of state of the art prediction systems and present an interesting set of results and metrics that could be used by operational centres developing future systems. I have most minor comments and suggestions.
L30: The celebrated criterion of no wave propagation above an upper wind threshold (Charney and Drazin 1961) is a linear rather than nonlinear result.
L64-66: Suggest you remove this summary of results as it is repetitive of the Abstract and pre-empts the results section.
L128: Why does an error in variance affect the correlations? After all, correlations are by definition insensitive to the amplitude of variability so do you mean regressions here?
Fig.1 is very striking and suggests very large errors in the total variance of the models. Is it really correct that there are tens of percent errors in variance with too much in the troposphere and too little in the stratosphere? I have not seen this before and I think you should check and then emphasize this if it's robust.
Fig.2 would benefit from adding N Hem and S Hem labels.
L155 and throughout the paper: In many cases it is really only some of the models that show the errors highlighted, for example in Fig.2a. PLease can the paper be phrased more carefully to say things like "models in general" or "models tend to" to avoid giving the impression that all models show the same errors?
L185-190, L245 etc: The paper tends to only reference very recent papers rather than giving a representative picture of current knowledge and following the scientific convention of acknowledging those papers that first demonstrated ideas. Some rewriting is needed to better represent this. For example some wider discussion on the current knowledge of the effects of model lid height/degraded stratosphere would be welcome to put the results in wider context. Papers by Boville, J.A.S., 1984; Lawrence, J.G.R.,1997; Marshall and Scaife, J.G.R., 2010; Shaw and Perlwitz J.Clim., 2010.
L265, L400: The underestimation of the heat flux variability in the stratosphere and upper troposphere is interesting. Is the underestimation of v*T* related to the underestimation of ENSO teleconnections reported in Garfinkel et al 2022 and Williams et al 2023? IS this also related to the so called signal to noise paradox in liong range forecasts which appears to be clearer in the northern hemisphere than the southern hemisphere, just like the biases reported here? Perhaps some discussion would be useful on these points?
Figure 10: please provide a full caption for ease of reading.
L362-364: Please again provide wider referencing for the surface impact of the stratosphere e.g. Baldwin and Thompson Quart. J Roy. Met. Soc. 2009, Kidston et al., Nat. Geosci., 2015.
L450: This is a potentially important point and should be moved to the earlier methods section.
Citation: https://doi.org/10.5194/egusphere-2024-1762-RC1 -
AC1: 'Reply on RC1', Chaim Garfinkel, 15 Sep 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-1762/egusphere-2024-1762-AC1-supplement.pdf
-
AC2: 'Reply on RC1', Chaim Garfinkel, 15 Sep 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-1762/egusphere-2024-1762-AC2-supplement.pdf
-
AC1: 'Reply on RC1', Chaim Garfinkel, 15 Sep 2024
-
RC2: 'Comment on egusphere-2024-1762', Edwin Gerber, 05 Aug 2024
The authors present a tour-de-force evaluation of stratosphere-troposphere coupling in subseasonal forecast systems. As I understand, this comprehensive analysis is the product of a team effort, led by Chaim Garfinkel with Zach Laurence and Amy Butler, as part of the SNAP project. I recommend publication of this manuscript after consideration of the minor points below. This study will provide a valuable reference point for evaluating subseasonal to seasonal (S2S) forecasting systems, both for assessing the current state of systems, but more importantly, for identifying key metrics that modeling centers can use to measure improvement in the future. Furthermore, I found the manuscript well written and structured. The amount of information in the figures was at times overwhelming, but I appreciate that the goal is to document the state of the S2S systems. I commend the authors for both the thoroughness and quality of the analyses and presentation.
Minor points to consider
1) I trust that the lid height is not the most important feature of an S2S system. Rather, as the authors are fully aware, it is correlated with other features that matter. (This point was made clear when they had to decide what to do with WACCM, which has a much higher top than the other models. And to take it to the extreme, I trust that adding a layer in the thermosphere to any low top model will have little, if any, impact on that model's performance in the stratosphere.) I trust that models with a higher lid have better resolution in the stratosphere, better representation of sub-grid physics relevant to the stratosphere (gravity wave drag), and perhaps most importantly, indicate an interest in stratospheric dynamics from the relevant modeling center, so that care was taken to capture and evaluate performance of the model in this region.
The paper is already long, but some discussion, and possibly a little analysis, might help make this point clear. To be concrete, I would suspect that the resolution in the UTLS (upper troposphere and lower stratosphere), and more generally through the stratosphere, is most critical. A high lid height’s main contribution is likely to ensure that the needed numerical sponge layers at the top are above the stratosphere, where they would corrupt the dynamics.
To be constructive, could the authors provide a bit more information in Table 1? For example, what is the vertical resolution near the tropopause, and how many layers are included between 100 and 10 hPa, and 10 and 1 hPa.
Emphasizing that my suggestions are minor, it could also be important to identify features in the models, e.g., their representation of gravity wave momentum parameterizations (types - orographic, frontal, convective) and how well its tied to sources (fixed sources, or dynamic, e.g., coupled with convection/frontegenisis). Might it also be possible to note how radiation is treated (particularly, ozone).
Finally, I’m curious if vertical resolution (grid spacing) or the number of levels in the stratosphere might be better metrics for comparison than lid height, say, in Figure 4. I appreciate this could be a rabbit hole – best left for future research, if at all – but more quickly, one could plot lid height vs resolution or number of levels, to quickly get a sense of how these things are correlated.
To end on a positive note, I appreciated how the authors explain that many of the correlations with lid height are even stronger if you correlated with the mean state (i.e., wave 1 variability differences correlate strongly with biases in the mean representation of wave 1). This provides a concrete pathway for trying to improve models, other than simply raising the lid!
2) It was at times hard for me to assess the sampling uncertainty in results. I put this as a minor point, as the goal here was to document differences between models, not to say that one model was “better” than another, or clearly wrong. Below are some comments meant to be helpful, not to nit pick.
The winter stratosphere is one of the most variable regions of the atmosphere, and work on vortex variability has been hampered by sampling uncertainty. From figure 1 onwards, I was unsure how much of a bias indicates a problem, as opposed to just bad luck. At the S2S time scale, we expect the atmosphere to be entering a chaotic regime, so I don’t think it’s fair to say that models must reproduce ERA5. Rather, they should be able to reproduce the statistics of ERA5.
To be constructive, you have a lot more data in ERA5. Might it be possible to evaluate metrics over 1979-1999, to give a very rough estimate of how much the “truth” changes depending on the period? If these two periods in ERA5 differ by X %, that would be a very rough estimate of sampling uncertainty.
In many of the figures, e.g. 2, 7, 11, the authors show how the answer in ERA5 changes when data is subsampled as with the forecast systems. This is exactly the type of analysis I would like to see, but I didn’t fully understand how they did it. Maybe just a paragraph in the methods section would help. Also, in the text,, I didn’t see too much discussion of these sampling error bars. For instance, in the discussion surrounding Fig 2a and b, at line 156 the authors state that the models systematically underestimate the correlation and regression coefficient of wave 1. I agree that most models fall below the thick black line, but don’t many fall within the sampling uncertainty here? [As I emphasized above, this is a minor concern, as the authors are not trying to explicitly say that models are wrong, but rather establish a metric for comparision.]
Below are small questions about statistics.
- At line 170, you talk about a correlation of -0.34. The caption notes that a correlation of -.42 is needed to reject the null hypothesis at 95% confidence. I suspect there’s a real problem with the models, but it would be good to acknowledge that this could be by chance in the text.
- If you consider just the high top models, is there any significant correlation with lid height? If not, then it’s strong evidence that once the lid is sufficiently high to get the sponge out of the stratosphere, lid height doesn’t matter any more.
- For Figures 6 and 8, I wonder if the fact that models provide ensemble forecast could weaken the correlation. [As I understand, at least for figure 8, the correlation was first computed for each ensemble member, and then averaged; I’m not sure if something similar was done for 6.] Wouldn’t this have a tendency to reduce the correlation for the models: you are comparing a model mean against a single sample from ERA5. To be constructive, do the magnitudes of the correlation from the models increase to similar values if you consider only one ensemble member? Or could you put a rough uncertainty estimate on the ERA5 values?
- Line 215 and Figure 7. As I noted about the sampling error bars on figure 2, they don’t seem to factor into the discussion of figure 7 here. I agree the models are below the thick black line, but many seem within the sampling uncertainty.
To end on a positive note, Figure 9 was particularly eye-catching. I was initially worried about the statistics of extreme events, but this figure makes a pretty compelling argument that something is wrong with the models. It begs future work on the spread of the S2S systems with time: they are both drifting to a biased mean state, but seem to be losing a lot of variability!
—
Tiny comments
I believe that “e.g.” should always be followed by a comma, e.g., as I just demonstrated. A quick search would spot ones you missed.
At line 98, a closing parentheses is missing, or you could just eliminate the first one before e.g.
311 and 439: I did not find this long list of citations about how we don’t don’t understand planetary waves very informative. Consider being a bit more sparse here, or breaking up the list, explaining briefly the key contributions of the citations.
I felt the opening line of the discussion and conclusions could be flipped. I’d first note that the troposphere perturbs the stratospheric circulation (i.e., there’s a reason why stratospheric variability differs so much between the hemispheres), and then that stratosphere in turn impacts the troposphere.
At line 425 I thought that recent work by Marina Friedel and Gabriel Chiodo was particularly relevant. The authors mention this work shortly afterwards, at line 470, so I think it’s fine.
—
To finish my review, I wanted to note that the authors very nicely sum up the importance of this work in the last paragraph. Bravo!
Citation: https://doi.org/10.5194/egusphere-2024-1762-RC2 -
AC3: 'Reply on RC2', Chaim Garfinkel, 15 Sep 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-1762/egusphere-2024-1762-AC3-supplement.pdf
Status: closed
-
RC1: 'Comment on egusphere-2024-1762', Anonymous Referee #1, 22 Jul 2024
This paper discusses the ability of long range forecast models to represent troposphere stratosphere coupling. The results come from a set of state of the art prediction systems and present an interesting set of results and metrics that could be used by operational centres developing future systems. I have most minor comments and suggestions.
L30: The celebrated criterion of no wave propagation above an upper wind threshold (Charney and Drazin 1961) is a linear rather than nonlinear result.
L64-66: Suggest you remove this summary of results as it is repetitive of the Abstract and pre-empts the results section.
L128: Why does an error in variance affect the correlations? After all, correlations are by definition insensitive to the amplitude of variability so do you mean regressions here?
Fig.1 is very striking and suggests very large errors in the total variance of the models. Is it really correct that there are tens of percent errors in variance with too much in the troposphere and too little in the stratosphere? I have not seen this before and I think you should check and then emphasize this if it's robust.
Fig.2 would benefit from adding N Hem and S Hem labels.
L155 and throughout the paper: In many cases it is really only some of the models that show the errors highlighted, for example in Fig.2a. PLease can the paper be phrased more carefully to say things like "models in general" or "models tend to" to avoid giving the impression that all models show the same errors?
L185-190, L245 etc: The paper tends to only reference very recent papers rather than giving a representative picture of current knowledge and following the scientific convention of acknowledging those papers that first demonstrated ideas. Some rewriting is needed to better represent this. For example some wider discussion on the current knowledge of the effects of model lid height/degraded stratosphere would be welcome to put the results in wider context. Papers by Boville, J.A.S., 1984; Lawrence, J.G.R.,1997; Marshall and Scaife, J.G.R., 2010; Shaw and Perlwitz J.Clim., 2010.
L265, L400: The underestimation of the heat flux variability in the stratosphere and upper troposphere is interesting. Is the underestimation of v*T* related to the underestimation of ENSO teleconnections reported in Garfinkel et al 2022 and Williams et al 2023? IS this also related to the so called signal to noise paradox in liong range forecasts which appears to be clearer in the northern hemisphere than the southern hemisphere, just like the biases reported here? Perhaps some discussion would be useful on these points?
Figure 10: please provide a full caption for ease of reading.
L362-364: Please again provide wider referencing for the surface impact of the stratosphere e.g. Baldwin and Thompson Quart. J Roy. Met. Soc. 2009, Kidston et al., Nat. Geosci., 2015.
L450: This is a potentially important point and should be moved to the earlier methods section.
Citation: https://doi.org/10.5194/egusphere-2024-1762-RC1 -
AC1: 'Reply on RC1', Chaim Garfinkel, 15 Sep 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-1762/egusphere-2024-1762-AC1-supplement.pdf
-
AC2: 'Reply on RC1', Chaim Garfinkel, 15 Sep 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-1762/egusphere-2024-1762-AC2-supplement.pdf
-
AC1: 'Reply on RC1', Chaim Garfinkel, 15 Sep 2024
-
RC2: 'Comment on egusphere-2024-1762', Edwin Gerber, 05 Aug 2024
The authors present a tour-de-force evaluation of stratosphere-troposphere coupling in subseasonal forecast systems. As I understand, this comprehensive analysis is the product of a team effort, led by Chaim Garfinkel with Zach Laurence and Amy Butler, as part of the SNAP project. I recommend publication of this manuscript after consideration of the minor points below. This study will provide a valuable reference point for evaluating subseasonal to seasonal (S2S) forecasting systems, both for assessing the current state of systems, but more importantly, for identifying key metrics that modeling centers can use to measure improvement in the future. Furthermore, I found the manuscript well written and structured. The amount of information in the figures was at times overwhelming, but I appreciate that the goal is to document the state of the S2S systems. I commend the authors for both the thoroughness and quality of the analyses and presentation.
Minor points to consider
1) I trust that the lid height is not the most important feature of an S2S system. Rather, as the authors are fully aware, it is correlated with other features that matter. (This point was made clear when they had to decide what to do with WACCM, which has a much higher top than the other models. And to take it to the extreme, I trust that adding a layer in the thermosphere to any low top model will have little, if any, impact on that model's performance in the stratosphere.) I trust that models with a higher lid have better resolution in the stratosphere, better representation of sub-grid physics relevant to the stratosphere (gravity wave drag), and perhaps most importantly, indicate an interest in stratospheric dynamics from the relevant modeling center, so that care was taken to capture and evaluate performance of the model in this region.
The paper is already long, but some discussion, and possibly a little analysis, might help make this point clear. To be concrete, I would suspect that the resolution in the UTLS (upper troposphere and lower stratosphere), and more generally through the stratosphere, is most critical. A high lid height’s main contribution is likely to ensure that the needed numerical sponge layers at the top are above the stratosphere, where they would corrupt the dynamics.
To be constructive, could the authors provide a bit more information in Table 1? For example, what is the vertical resolution near the tropopause, and how many layers are included between 100 and 10 hPa, and 10 and 1 hPa.
Emphasizing that my suggestions are minor, it could also be important to identify features in the models, e.g., their representation of gravity wave momentum parameterizations (types - orographic, frontal, convective) and how well its tied to sources (fixed sources, or dynamic, e.g., coupled with convection/frontegenisis). Might it also be possible to note how radiation is treated (particularly, ozone).
Finally, I’m curious if vertical resolution (grid spacing) or the number of levels in the stratosphere might be better metrics for comparison than lid height, say, in Figure 4. I appreciate this could be a rabbit hole – best left for future research, if at all – but more quickly, one could plot lid height vs resolution or number of levels, to quickly get a sense of how these things are correlated.
To end on a positive note, I appreciated how the authors explain that many of the correlations with lid height are even stronger if you correlated with the mean state (i.e., wave 1 variability differences correlate strongly with biases in the mean representation of wave 1). This provides a concrete pathway for trying to improve models, other than simply raising the lid!
2) It was at times hard for me to assess the sampling uncertainty in results. I put this as a minor point, as the goal here was to document differences between models, not to say that one model was “better” than another, or clearly wrong. Below are some comments meant to be helpful, not to nit pick.
The winter stratosphere is one of the most variable regions of the atmosphere, and work on vortex variability has been hampered by sampling uncertainty. From figure 1 onwards, I was unsure how much of a bias indicates a problem, as opposed to just bad luck. At the S2S time scale, we expect the atmosphere to be entering a chaotic regime, so I don’t think it’s fair to say that models must reproduce ERA5. Rather, they should be able to reproduce the statistics of ERA5.
To be constructive, you have a lot more data in ERA5. Might it be possible to evaluate metrics over 1979-1999, to give a very rough estimate of how much the “truth” changes depending on the period? If these two periods in ERA5 differ by X %, that would be a very rough estimate of sampling uncertainty.
In many of the figures, e.g. 2, 7, 11, the authors show how the answer in ERA5 changes when data is subsampled as with the forecast systems. This is exactly the type of analysis I would like to see, but I didn’t fully understand how they did it. Maybe just a paragraph in the methods section would help. Also, in the text,, I didn’t see too much discussion of these sampling error bars. For instance, in the discussion surrounding Fig 2a and b, at line 156 the authors state that the models systematically underestimate the correlation and regression coefficient of wave 1. I agree that most models fall below the thick black line, but don’t many fall within the sampling uncertainty here? [As I emphasized above, this is a minor concern, as the authors are not trying to explicitly say that models are wrong, but rather establish a metric for comparision.]
Below are small questions about statistics.
- At line 170, you talk about a correlation of -0.34. The caption notes that a correlation of -.42 is needed to reject the null hypothesis at 95% confidence. I suspect there’s a real problem with the models, but it would be good to acknowledge that this could be by chance in the text.
- If you consider just the high top models, is there any significant correlation with lid height? If not, then it’s strong evidence that once the lid is sufficiently high to get the sponge out of the stratosphere, lid height doesn’t matter any more.
- For Figures 6 and 8, I wonder if the fact that models provide ensemble forecast could weaken the correlation. [As I understand, at least for figure 8, the correlation was first computed for each ensemble member, and then averaged; I’m not sure if something similar was done for 6.] Wouldn’t this have a tendency to reduce the correlation for the models: you are comparing a model mean against a single sample from ERA5. To be constructive, do the magnitudes of the correlation from the models increase to similar values if you consider only one ensemble member? Or could you put a rough uncertainty estimate on the ERA5 values?
- Line 215 and Figure 7. As I noted about the sampling error bars on figure 2, they don’t seem to factor into the discussion of figure 7 here. I agree the models are below the thick black line, but many seem within the sampling uncertainty.
To end on a positive note, Figure 9 was particularly eye-catching. I was initially worried about the statistics of extreme events, but this figure makes a pretty compelling argument that something is wrong with the models. It begs future work on the spread of the S2S systems with time: they are both drifting to a biased mean state, but seem to be losing a lot of variability!
—
Tiny comments
I believe that “e.g.” should always be followed by a comma, e.g., as I just demonstrated. A quick search would spot ones you missed.
At line 98, a closing parentheses is missing, or you could just eliminate the first one before e.g.
311 and 439: I did not find this long list of citations about how we don’t don’t understand planetary waves very informative. Consider being a bit more sparse here, or breaking up the list, explaining briefly the key contributions of the citations.
I felt the opening line of the discussion and conclusions could be flipped. I’d first note that the troposphere perturbs the stratospheric circulation (i.e., there’s a reason why stratospheric variability differs so much between the hemispheres), and then that stratosphere in turn impacts the troposphere.
At line 425 I thought that recent work by Marina Friedel and Gabriel Chiodo was particularly relevant. The authors mention this work shortly afterwards, at line 470, so I think it’s fine.
—
To finish my review, I wanted to note that the authors very nicely sum up the importance of this work in the last paragraph. Bravo!
Citation: https://doi.org/10.5194/egusphere-2024-1762-RC2 -
AC3: 'Reply on RC2', Chaim Garfinkel, 15 Sep 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-1762/egusphere-2024-1762-AC3-supplement.pdf
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
419 | 143 | 68 | 630 | 74 | 13 | 16 |
- HTML: 419
- PDF: 143
- XML: 68
- Total: 630
- Supplement: 74
- BibTeX: 13
- EndNote: 16
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1