the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A rapid application emissions-to-impacts tool for scenario assessment: Probabilistic Regional Impacts from Model patterns and Emissions (PRIME)
Abstract. Climate policies evolve quickly, and new scenarios designed around these policies are used to illustrate how they impact global mean temperatures using simple climate models (or climate emulators). Simple climate models are extremely efficient although limited to showing only the global picture. Within the Intergovernmental Panel on Climate Change (IPCC) framework, there is a need to understand the regional impacts of scenarios that include the most recent science and policy decisions quickly to support government in negotiations. To address this, we present PRIME (Probabilistic Regional Impacts from Model patterns and Emissions), a new flexible probabilistic framework which aims to provide an efficient means to run new scenarios without the significant overheads of larger more complex Earth system models (ESMs). PRIME provides the capability to include the most recent models, science and scenarios to run ensemble simulations on multi-centennial timescales and include analysis of many variables that are relevant and important for impacts assessments. We use a simple climate model to provide the global temperatures to scale the patterns from a large number of the CMIP6 ESMs. These provide the inputs to a weather generator and a land-surface model, which generates an estimate of the land-surface impacts from the emissions scenarios. Here we test PRIME using known scenarios in the form of the Shared Socioeconomic Pathways (SSPs) to demonstrate that PRIME reproduces the climate response to a range of emissions scenarios, as shown in the IPCC reports. We show results for a range of scenarios including the SSP5-8.5 high emissions scenario, which was used to define the patterns; SSP1-2.6, a mitigation scenario with low emissions and SSP5-3.4-OS, an overshoot scenario. PRIME correctly represents the climate response for these known scenarios, which gives us confidence that PRIME will be useful for rapidly providing probabilistic spatially resolved information for novel climate scenarios; substantially reducing the time between the scenarios being released and being used in impacts assessments.
- Preprint
(12509 KB) - Metadata XML
-
Supplement
(28685 KB) - BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on egusphere-2023-2932', Anonymous Referee #1, 22 Mar 2024
The authors propose a global downscaling system based on “pattern scaling” where the global-mean temperature is used to predict local mean values of all variables on a GCM grid, which is then used to drive an offline land model in order to calculate climate impacts. This approach is considerably cheaper than running a GCM and is proposed for generating large ensembles and testing more compete sets of scenarios while still providing regionally specific outputs.
I think this system is worth publishing but have some issues to raise that will require moderate revisions.
General Comments
- The success of the pattern scaling approach is not well tested in my opinion, mainly because most of the tests examine the change by end of the century under the RCP8.5 scenario, which is exactly the same one used to determine the pattern; and the change by end of century will dominate the variance that a max-likelihood linear fit is trying hardest to fit. Thus the success is built in by design and the agreement shown in e.g. Fig. 5 is meaningless. What is needed is an out-of-sample test such as the accuracy at mid-century, and/or for other RCP scenarios—the RCP6 or 4.5 scenarios would seem like the obvious test targets. The time series comparisons (Figs. 6,7) look OK but not great, as there are some errors mid-century that are as large as the signal. This suggests that the pattern scaling approach isn’t as accurate as we’d like.
In short, the paper needs to get rid of RCP8.5 tests and instead show tests on at least two other RCPs to give a realistic idea of out-of-sample performance. - One reason the performance isn’t always good, especially on precipitation, may be that the authors are ignoring the direct effects of CO2 on precipitation which are substantial (e.g. Bony et al. 2013). Past studies show that by combining the effects of CO2 and global-mean temperature, precipitation patterns can be well captured, but not based on temperature alone. Since the authors are already feeding CO2 and global-mean T to JULES, why not also use CO2 as a second predictor for the downscaling?
- I am very confused by the so-called “weather generator” since the text states that the same weather is used for every day of any given month (line 128). If so, that is extremely unrealistic and will produce extreme responses in the land model (since it will either rain every day of the month or not at all). This doesn’t sound like an actual weather generator. Little else is said about the weather generator except to cite Williams and Clark 2014—we should not have to look there to get basic information about what kind of weather is being inserted. If indeed the weather is being held constant for a whole month at a time and then switches to something else on the 1st of the next month, this needs to be highlighted as a significant limitation in discussing the results.
- The probabilistic framework being used is not clear from the description. Any probabilities will depend on the priors for example, which are not stated, and on what observations the probabilities are conditioned on which is also not stated. There are also some confusing statements in the text (see detailed comments below). This needs to be clarified if the intention is for this tool to be used for probability estimation. It looks like the pdfs are traced to an ensemble calculated by WGI of AR6 but still the assumptions should be stated here.
Detailed/Technical Comments
line 183: by constraining future projections here, do you mean constraining equilibrium climate sensitivity? This is not the same as constraining RCP projections directly (which depend on factors other than ECS, most importantly historical forcings).
line 196: this is stated a bit confusingly—I assume the CO2 is a forcer to the land model, not to the meteorology (which depends only on global-mean temperature).
line 200-202: I don’t understand why in your ensemble, CO2 and ECS would be correlated. I think this is because the important elements noted in Major Point 4 are missing. Even if you are conditioning on historical warming I don’t see why a higher future CO2 would imply a lower ECS. This would only make sense if you were targeting a specific warming, but that isn’t stated clearly and you are showing a spread of possible warmings for any given RCP, as occurs in standard GCM simulations where a prior is placed (implicitly and usually independently) on both ECS and carbon cycle parameters and this implies a posterior distribution of temperature at any future time. There are a number of past studies that obtain pdfs of future warming conditional on historical warming using EMICs, and this study should follow a similar approach; most of them use the GCM ECS distribution as a prior but some use observationally-constrained priors on ECS.
Fig. 3: y axis or caption needs to identify at what time the CO2 concentration is determined.
Fig. 4: upper right figure panel needs to specify what humidity it is (specific humidity, according to the text).
line 220-222: I think what this text means is that you are correlating the decadal means of the predicted vs. CMIP variables—please state clearly. I would not say the correlations are very good for precipitation, wind etc., since much of the map is around .4 or less which means only 20% of the variance is captured by the emulator.
--------
Bony S; Bellon G; Klocke D; Sherwood S; Fermepin S; Denvil S, 2013, 'Robust direct effect of carbon dioxide on tropical circulation and regional precipitation', Nature Geoscience, 6, pp. 447 - 451, http://dx.doi.org/10.1038/ngeo1799
Citation: https://doi.org/10.5194/egusphere-2023-2932-RC1 - The success of the pattern scaling approach is not well tested in my opinion, mainly because most of the tests examine the change by end of the century under the RCP8.5 scenario, which is exactly the same one used to determine the pattern; and the change by end of century will dominate the variance that a max-likelihood linear fit is trying hardest to fit. Thus the success is built in by design and the agreement shown in e.g. Fig. 5 is meaningless. What is needed is an out-of-sample test such as the accuracy at mid-century, and/or for other RCP scenarios—the RCP6 or 4.5 scenarios would seem like the obvious test targets. The time series comparisons (Figs. 6,7) look OK but not great, as there are some errors mid-century that are as large as the signal. This suggests that the pattern scaling approach isn’t as accurate as we’d like.
-
RC2: 'Comment on egusphere-2023-2932', Anonymous Referee #2, 04 Jun 2024
Review:
General Description:
The authors present a combination of three existing model approaches – a global climate model emulator (FaIR), a traditional pattern scaling approach, and the JULES land model. They term this chain of models PRIME, suggesting that "PRIME correctly represents the climate response for [these] known scenarios,..." and that "PRIME enables the state-of-the-art science to be used throughout the modelling chain starting from the latest scenarios all the way to the simulations of regional impacts."
Overall Comment:
If the paper were presented with a heading like "Pattern-scaling approaches to drive JULES" or something similar, then I think it would be a great addition to the scientific literature – as it is a fine example of how the chain from global emission scenarios to some land-based impact metrics can be made with a number of (simplified) assumptions. However, the paper presents itself as playing in a different league, e.g., to "bypass ESMs" (line 66), or implicitly suggests that ISIMIP bias-corrected ESM outputs could be replaced (lines 25 to 40ff). With these heightened expectations, I'm sorry to say that the paper is underwhelming. The reasons are:
- The chosen method to justify the adequateness of pattern scaling (e.g., Fig 4): The authors show the Pearson correlation coefficients between scaled patterns (derived apparently from a linear regression of CMIP6 output against global mean temperature) and the CMIP6 data. I am very confused about this choice, i.e., to use a Pearson correlation coefficient for "Evaluation" (see e.g., caption of Figure 4). Suppose there is no change in regional precipitation in a specific region under climate change. If the pattern scaling "gets it correct" and indicates zero changes for those grid points, the applied Pearson correlation coefficients would be around zero (as there is no linear relationship then between pattern scaling and CMIP6). Thus, authors should use either standard RMSE (see Chapter 3 of IPCC AR6 WG1) or other useful metrics – or explicitly justify their use of the Pearson correlation coefficients.
- Unclear p-values and low skill for 5 out of 8 variables: In Figure S2, the authors show the percentage of p-values, averaged over models and months. First, I couldn’t find any statistical description of what null hypothesis was tested. That there is no climate change? The authors are strongly urged to complement the paper with a detailed statistical section that both illuminates their suggested uses of Pearson correlation coefficients (or ideally other evaluation metrics) and the p-values here in Figure S2. On the substance of it: Only three out of the eight variables are shown to have satisfactory ‘p-values’ of <0.05 (whatever exactly was measured here). How can the authors claim the low percentages of <0.05 p-values (take e.g., Northern Europe or North America precipitation changes) as an indicator that "PRIME correctly represents the climate response for these known scenarios" (Abstract, line 16f.). That seems a really far-fetched conclusion given the presented results in Figure S2 and Figure 4, which seem to suggest that for precipitation, the PRIME pattern scaling results are essentially useless for many key world regions.
- Almost the same as a study from a quarter-century ago. As the authors state, PRIME is very similar to the year 2000 pattern scaling approach by Huntingford and Cox. Indeed, the two papers have almost a very similar scope, using the TRIFFID model instead of JULES. And arguably, the study from almost half a century ago uses more elaborate timeseries plots and statistics to showcase the merit (despite the general strong limitations of pattern scaling for the majority of variables).
- Science has evolved since 2000. Probably most fundamentally, I am concerned about the following point. For many of these five variables, the literature is far more progressed and established that simple pattern scaling does not work satisfactorily. It works like a charm for regional mean temperature, even extremes to some extent, but not for precipitation, wind, pressure, etc.
- Take for example precipitation. As Allen and Ingram pointed out in 2002 (https://doi.org/10.1038/nature01092), or Andrews et al., 2010 (https://doi.org/10.1029/2010GL043991), the hydrological cycle underlies various constraints, but is not only driven by global or regional surface air temperature changes. The vertical changes in the troposphere’s energy budget, i.e., the GHG radiative forcing itself as well as the aerosol cloud interactions, have a substantial effect on precipitation.
- Take for example wind and storm tracks: The key feature is that midlatitude storm tracks might move poleward due to a broadening of the Hadley cells (https://doi.org/10.1038/s41561-017-0001-8). That is fundamentally at odds with simple linear pattern scaling, which scales the response at one location just up.
- The somewhat poor alignment of PRIMAP results with observations for the historical period underscores these fundamental shortcomings of pattern scaling when venturing outside temperature, specific humidity, and longwave downwelling radiation (which is not too different from lower tropospheric air temperature). Similarly to Huntingford and Cox (2000), Zelaszowski et al. (2018), and others, the skill of regional precipitation patterns remains very low to the extent of not being very useful.
- PRIME is not open source, as the underlying JULES code is not open source, if I understand correctly.
- PRIME does not seem to be a tool in itself. In the code availability section, unless I overlooked it, there is no PRIME code available. The pointers are to the underlying energy balance model, to the ESMVALtool (which has tons of functionality beyond patterns), and to JULES (upon request). Thus, the paper seems to describe a sequence of how to apply three other models in sequence. Yet, PRIME is described as a ‘tool’ itself. What am I misunderstanding?
Overall Recommendation:
If the authors undertake major revisions, possibly a combination of back-scaling the bold expectations that they raise among readers, with additional strong skill statistics and an extensive set of limitations, I think the paper can be a very valuable contribution. That is because systems like the proposed PRIME one are definitely needed. The demand for probabilistic climate impact projection systems is definitely there. However, the paper has to be honest about the various shortcomings, rather than claiming that it correctly represents the climate projections of CMIP6 models (e.g., line 16 and similar at many other places). The system that Huntingford and Cox (2000) described was not too dissimilar, to be frank.
Detailed Comments:
- Line 15: ‘which was used to define patterns’. A stricter delineation into "training" and "verification" data throughout the manuscript would be appreciated. For example, it is not clear whether Figure 4 is based on the SSP5-8.5 data or not. It better not be.
- Line 19: ‘being used for impact assessments’. The monthly average patterns are maybe sufficient for some impact patterns, but a simple pattern scaling approach that, for example, completely loses the covariance between temperature and precipitation extremes is not useful for all impact studies. As mentioned above, a bit more precise wording would be appropriate (or a bit more humbleness, or both).
- Line 47 ff.: A more comprehensive discussion of the many similarities and few differences to the rich history of pattern scaling approaches seems useful. Certainly the 1999 Huntingford & Cox study, the Mitchell review paper, etc.
- Line 66: ‘Bypassing ESMs’ is strong wording – and I would say inappropriate. At best, the proposed approach can approximate a few key characteristic outcomes of ESMs.
- Line 73: ‘PRIME enables the state-of-the-art-science throughout the modelling chain’. It would be good if the authors can explain that a bit better, specifically with regards to changes in variability, compound risks, etc. (Or choose less hyperbolic wording).
- Line 109ff: So, if I understand the method correctly, only 9 runs are undertaken with the energy balance model – scanning the stated percentiles (BTW: It would be interesting to hear how high the author’s confidence is in the min-max values, the 0% and 100% percentile). In general, I think the methods section needs to be much expanded, and the authors should clarify how exactly these 9 EBM runs are combined with various patterns? Also, how does the methodology cater for hot and dry futures versus hot and wet futures? Is any covariance preserved across the variables?
- Line 115 et al: As mentioned above, the methodology description does in its current form not allow reproducing the study. I would argue it should – even though the code is allegedly provided. For example, over which time window was the regression undertaken? Including all points from 1900 to 2100 or just the last twenty years from 2081-2100 in the SSP5-8.5 run?
- Line 205, Figure 3: How does this plot show the validity of the chosen approach? Even though the underlying energy balance model is able to produce a joint probability between CO2 concentrations and global mean temperature, the sampling of the 9 percentiles is only sampling a fraction of the distribution in the CO2 concentration dimension?! The sampled CO2 concentrations in PRIME span 950ppm to 1100ppm, while the energy balance model output suggests a range from 800ppm to 1200ppm?! In some regions (such as the Amazon), an 800ppm or 1200ppm CO2 concentration might well produce a different physiological plant response, yet the probabilistic PRIME model does not seem to propagate that information forward?
- Line 230, Figure 5: If SSP5-8.5 data is used to derive the PRIME patterns, it is hardly a useful comparison to show a comparison between SSP5-8.5 CMIP6 and PRIME. That is like showing training data as independent verification, which misleads about the skill of the model.
- Line 238 and vicinity: The authors just skim over the fact that when compared to other scenarios that are not used for deriving the patterns, the skill gets worse (partly understandable because of the lower signal-to-noise ratio in lower scenarios). The authors need to unpack that with quantifiable information and more details (i.e., which periods, which models were looked at, what is the difference across CMIP6 models, how is the transient skill before the end of the century, etc.).
- Line 238: When the authors write: ‘However, the high correlations and low RMSE give us confidence to apply the pattern scaling…’ I am not sure what ‘high correlations’ the authors refer to? The SSP5-8.5 ones from which the patterns were derived or the lower SSP scenario ones? If the former, then the high correlations should not give confidence to anyone that the model is skillful outside its training scope. If the latter, then the detailed plots in the supplementary are providing the usual picture that linear pattern scaling provides: That it is very good for some variables, and absolutely unusable for others. For example, Figure S8 on shortwave downward radiation shows the ‘prime’ (pardon the pun) example, where pattern scaling with global mean temperature does NOT work.
- Table 1: It would probably be useful to the reader if the RMSE values are put into the context of the mean change of the respective variables.
- Table 1: The Pearson correlation coefficients that are stated seem oddly high and I can’t reconcile them with the regionally disaggregated ones shown in Figure 4. For example, take precipitation in Figure 4… judging from the color scale, the global-mean Pearson correlation coefficient (if it were meaningful, see above) would be somewhere between 0.3 and 0.7. Yet, the results for all three SSPs show values well above 0.83 in Table 1, even 0.97 for SSP5-8.5!? Please append with much more methodological detail and/or code of what exactly was done. And please explain, why those are consistent.
- Line 313: ‘These comparisons allow us to confidently use the PRIME framework to assess impacts.. ‘ – Again, I think this is another example of slightly overconfident language that does not appropriately reflect the various limitations.
- Line 317: The authors write: “Hence it is novel to show a full probabilistic range of the possible spread of simulated carbon balance (represented by NEP)’. See above. I have my doubts, whether ‘full probabilistic’ is the right term here.. as e.g., the CO2 concentration uncertainties do not seem to be explored according to Figure 3.
- Line 359: The MESMER tool showed some improvements for surface air temperature when using additional predictors. That regional surface air temperature is the variable that is stunningly well already predicted with a pattern scaling approach. Nobody challenges the usefulness of pattern scaling for monthly mean regional temperatures. The authors seem to want to suggest here though that this marginal improvement would also be true for pattern scaling more generally. Really? I highly doubt it given the literature on scaling regional precipitation, for example, where GHG forcing, (regional) aerosol forcing have clearly been shown to not only provide marginal improvement but are vital predictors without which precipitation cannot be adequately projected (see above).
- Line 384: The authors write “Overall we have shown PRIME faithfully reproduces the climate response… “. If authors would phrase this conclusion to something like “Within the known limits of the linear pattern scaling approaches, we have shown that 3 out of the investigated 8 variables can adequately be projected in their individual monthly means’… or similar, I would have no issue with it. But conclusions like the one above are way overconfident in my view.
- I can see how much dedicated work went into this manuscript, which is why I apologize that I cannot be more positive. With major revisions, I think this manuscript can add a useful contribution to a very important field – but the current form of the manuscript requires an overhaul from multiple angles in my perception.
Citation: https://doi.org/10.5194/egusphere-2023-2932-RC2 - AC1: 'Comment on egusphere-2023-2932', Camilla Mathison, 01 Jul 2024
Status: closed
-
RC1: 'Comment on egusphere-2023-2932', Anonymous Referee #1, 22 Mar 2024
The authors propose a global downscaling system based on “pattern scaling” where the global-mean temperature is used to predict local mean values of all variables on a GCM grid, which is then used to drive an offline land model in order to calculate climate impacts. This approach is considerably cheaper than running a GCM and is proposed for generating large ensembles and testing more compete sets of scenarios while still providing regionally specific outputs.
I think this system is worth publishing but have some issues to raise that will require moderate revisions.
General Comments
- The success of the pattern scaling approach is not well tested in my opinion, mainly because most of the tests examine the change by end of the century under the RCP8.5 scenario, which is exactly the same one used to determine the pattern; and the change by end of century will dominate the variance that a max-likelihood linear fit is trying hardest to fit. Thus the success is built in by design and the agreement shown in e.g. Fig. 5 is meaningless. What is needed is an out-of-sample test such as the accuracy at mid-century, and/or for other RCP scenarios—the RCP6 or 4.5 scenarios would seem like the obvious test targets. The time series comparisons (Figs. 6,7) look OK but not great, as there are some errors mid-century that are as large as the signal. This suggests that the pattern scaling approach isn’t as accurate as we’d like.
In short, the paper needs to get rid of RCP8.5 tests and instead show tests on at least two other RCPs to give a realistic idea of out-of-sample performance. - One reason the performance isn’t always good, especially on precipitation, may be that the authors are ignoring the direct effects of CO2 on precipitation which are substantial (e.g. Bony et al. 2013). Past studies show that by combining the effects of CO2 and global-mean temperature, precipitation patterns can be well captured, but not based on temperature alone. Since the authors are already feeding CO2 and global-mean T to JULES, why not also use CO2 as a second predictor for the downscaling?
- I am very confused by the so-called “weather generator” since the text states that the same weather is used for every day of any given month (line 128). If so, that is extremely unrealistic and will produce extreme responses in the land model (since it will either rain every day of the month or not at all). This doesn’t sound like an actual weather generator. Little else is said about the weather generator except to cite Williams and Clark 2014—we should not have to look there to get basic information about what kind of weather is being inserted. If indeed the weather is being held constant for a whole month at a time and then switches to something else on the 1st of the next month, this needs to be highlighted as a significant limitation in discussing the results.
- The probabilistic framework being used is not clear from the description. Any probabilities will depend on the priors for example, which are not stated, and on what observations the probabilities are conditioned on which is also not stated. There are also some confusing statements in the text (see detailed comments below). This needs to be clarified if the intention is for this tool to be used for probability estimation. It looks like the pdfs are traced to an ensemble calculated by WGI of AR6 but still the assumptions should be stated here.
Detailed/Technical Comments
line 183: by constraining future projections here, do you mean constraining equilibrium climate sensitivity? This is not the same as constraining RCP projections directly (which depend on factors other than ECS, most importantly historical forcings).
line 196: this is stated a bit confusingly—I assume the CO2 is a forcer to the land model, not to the meteorology (which depends only on global-mean temperature).
line 200-202: I don’t understand why in your ensemble, CO2 and ECS would be correlated. I think this is because the important elements noted in Major Point 4 are missing. Even if you are conditioning on historical warming I don’t see why a higher future CO2 would imply a lower ECS. This would only make sense if you were targeting a specific warming, but that isn’t stated clearly and you are showing a spread of possible warmings for any given RCP, as occurs in standard GCM simulations where a prior is placed (implicitly and usually independently) on both ECS and carbon cycle parameters and this implies a posterior distribution of temperature at any future time. There are a number of past studies that obtain pdfs of future warming conditional on historical warming using EMICs, and this study should follow a similar approach; most of them use the GCM ECS distribution as a prior but some use observationally-constrained priors on ECS.
Fig. 3: y axis or caption needs to identify at what time the CO2 concentration is determined.
Fig. 4: upper right figure panel needs to specify what humidity it is (specific humidity, according to the text).
line 220-222: I think what this text means is that you are correlating the decadal means of the predicted vs. CMIP variables—please state clearly. I would not say the correlations are very good for precipitation, wind etc., since much of the map is around .4 or less which means only 20% of the variance is captured by the emulator.
--------
Bony S; Bellon G; Klocke D; Sherwood S; Fermepin S; Denvil S, 2013, 'Robust direct effect of carbon dioxide on tropical circulation and regional precipitation', Nature Geoscience, 6, pp. 447 - 451, http://dx.doi.org/10.1038/ngeo1799
Citation: https://doi.org/10.5194/egusphere-2023-2932-RC1 - The success of the pattern scaling approach is not well tested in my opinion, mainly because most of the tests examine the change by end of the century under the RCP8.5 scenario, which is exactly the same one used to determine the pattern; and the change by end of century will dominate the variance that a max-likelihood linear fit is trying hardest to fit. Thus the success is built in by design and the agreement shown in e.g. Fig. 5 is meaningless. What is needed is an out-of-sample test such as the accuracy at mid-century, and/or for other RCP scenarios—the RCP6 or 4.5 scenarios would seem like the obvious test targets. The time series comparisons (Figs. 6,7) look OK but not great, as there are some errors mid-century that are as large as the signal. This suggests that the pattern scaling approach isn’t as accurate as we’d like.
-
RC2: 'Comment on egusphere-2023-2932', Anonymous Referee #2, 04 Jun 2024
Review:
General Description:
The authors present a combination of three existing model approaches – a global climate model emulator (FaIR), a traditional pattern scaling approach, and the JULES land model. They term this chain of models PRIME, suggesting that "PRIME correctly represents the climate response for [these] known scenarios,..." and that "PRIME enables the state-of-the-art science to be used throughout the modelling chain starting from the latest scenarios all the way to the simulations of regional impacts."
Overall Comment:
If the paper were presented with a heading like "Pattern-scaling approaches to drive JULES" or something similar, then I think it would be a great addition to the scientific literature – as it is a fine example of how the chain from global emission scenarios to some land-based impact metrics can be made with a number of (simplified) assumptions. However, the paper presents itself as playing in a different league, e.g., to "bypass ESMs" (line 66), or implicitly suggests that ISIMIP bias-corrected ESM outputs could be replaced (lines 25 to 40ff). With these heightened expectations, I'm sorry to say that the paper is underwhelming. The reasons are:
- The chosen method to justify the adequateness of pattern scaling (e.g., Fig 4): The authors show the Pearson correlation coefficients between scaled patterns (derived apparently from a linear regression of CMIP6 output against global mean temperature) and the CMIP6 data. I am very confused about this choice, i.e., to use a Pearson correlation coefficient for "Evaluation" (see e.g., caption of Figure 4). Suppose there is no change in regional precipitation in a specific region under climate change. If the pattern scaling "gets it correct" and indicates zero changes for those grid points, the applied Pearson correlation coefficients would be around zero (as there is no linear relationship then between pattern scaling and CMIP6). Thus, authors should use either standard RMSE (see Chapter 3 of IPCC AR6 WG1) or other useful metrics – or explicitly justify their use of the Pearson correlation coefficients.
- Unclear p-values and low skill for 5 out of 8 variables: In Figure S2, the authors show the percentage of p-values, averaged over models and months. First, I couldn’t find any statistical description of what null hypothesis was tested. That there is no climate change? The authors are strongly urged to complement the paper with a detailed statistical section that both illuminates their suggested uses of Pearson correlation coefficients (or ideally other evaluation metrics) and the p-values here in Figure S2. On the substance of it: Only three out of the eight variables are shown to have satisfactory ‘p-values’ of <0.05 (whatever exactly was measured here). How can the authors claim the low percentages of <0.05 p-values (take e.g., Northern Europe or North America precipitation changes) as an indicator that "PRIME correctly represents the climate response for these known scenarios" (Abstract, line 16f.). That seems a really far-fetched conclusion given the presented results in Figure S2 and Figure 4, which seem to suggest that for precipitation, the PRIME pattern scaling results are essentially useless for many key world regions.
- Almost the same as a study from a quarter-century ago. As the authors state, PRIME is very similar to the year 2000 pattern scaling approach by Huntingford and Cox. Indeed, the two papers have almost a very similar scope, using the TRIFFID model instead of JULES. And arguably, the study from almost half a century ago uses more elaborate timeseries plots and statistics to showcase the merit (despite the general strong limitations of pattern scaling for the majority of variables).
- Science has evolved since 2000. Probably most fundamentally, I am concerned about the following point. For many of these five variables, the literature is far more progressed and established that simple pattern scaling does not work satisfactorily. It works like a charm for regional mean temperature, even extremes to some extent, but not for precipitation, wind, pressure, etc.
- Take for example precipitation. As Allen and Ingram pointed out in 2002 (https://doi.org/10.1038/nature01092), or Andrews et al., 2010 (https://doi.org/10.1029/2010GL043991), the hydrological cycle underlies various constraints, but is not only driven by global or regional surface air temperature changes. The vertical changes in the troposphere’s energy budget, i.e., the GHG radiative forcing itself as well as the aerosol cloud interactions, have a substantial effect on precipitation.
- Take for example wind and storm tracks: The key feature is that midlatitude storm tracks might move poleward due to a broadening of the Hadley cells (https://doi.org/10.1038/s41561-017-0001-8). That is fundamentally at odds with simple linear pattern scaling, which scales the response at one location just up.
- The somewhat poor alignment of PRIMAP results with observations for the historical period underscores these fundamental shortcomings of pattern scaling when venturing outside temperature, specific humidity, and longwave downwelling radiation (which is not too different from lower tropospheric air temperature). Similarly to Huntingford and Cox (2000), Zelaszowski et al. (2018), and others, the skill of regional precipitation patterns remains very low to the extent of not being very useful.
- PRIME is not open source, as the underlying JULES code is not open source, if I understand correctly.
- PRIME does not seem to be a tool in itself. In the code availability section, unless I overlooked it, there is no PRIME code available. The pointers are to the underlying energy balance model, to the ESMVALtool (which has tons of functionality beyond patterns), and to JULES (upon request). Thus, the paper seems to describe a sequence of how to apply three other models in sequence. Yet, PRIME is described as a ‘tool’ itself. What am I misunderstanding?
Overall Recommendation:
If the authors undertake major revisions, possibly a combination of back-scaling the bold expectations that they raise among readers, with additional strong skill statistics and an extensive set of limitations, I think the paper can be a very valuable contribution. That is because systems like the proposed PRIME one are definitely needed. The demand for probabilistic climate impact projection systems is definitely there. However, the paper has to be honest about the various shortcomings, rather than claiming that it correctly represents the climate projections of CMIP6 models (e.g., line 16 and similar at many other places). The system that Huntingford and Cox (2000) described was not too dissimilar, to be frank.
Detailed Comments:
- Line 15: ‘which was used to define patterns’. A stricter delineation into "training" and "verification" data throughout the manuscript would be appreciated. For example, it is not clear whether Figure 4 is based on the SSP5-8.5 data or not. It better not be.
- Line 19: ‘being used for impact assessments’. The monthly average patterns are maybe sufficient for some impact patterns, but a simple pattern scaling approach that, for example, completely loses the covariance between temperature and precipitation extremes is not useful for all impact studies. As mentioned above, a bit more precise wording would be appropriate (or a bit more humbleness, or both).
- Line 47 ff.: A more comprehensive discussion of the many similarities and few differences to the rich history of pattern scaling approaches seems useful. Certainly the 1999 Huntingford & Cox study, the Mitchell review paper, etc.
- Line 66: ‘Bypassing ESMs’ is strong wording – and I would say inappropriate. At best, the proposed approach can approximate a few key characteristic outcomes of ESMs.
- Line 73: ‘PRIME enables the state-of-the-art-science throughout the modelling chain’. It would be good if the authors can explain that a bit better, specifically with regards to changes in variability, compound risks, etc. (Or choose less hyperbolic wording).
- Line 109ff: So, if I understand the method correctly, only 9 runs are undertaken with the energy balance model – scanning the stated percentiles (BTW: It would be interesting to hear how high the author’s confidence is in the min-max values, the 0% and 100% percentile). In general, I think the methods section needs to be much expanded, and the authors should clarify how exactly these 9 EBM runs are combined with various patterns? Also, how does the methodology cater for hot and dry futures versus hot and wet futures? Is any covariance preserved across the variables?
- Line 115 et al: As mentioned above, the methodology description does in its current form not allow reproducing the study. I would argue it should – even though the code is allegedly provided. For example, over which time window was the regression undertaken? Including all points from 1900 to 2100 or just the last twenty years from 2081-2100 in the SSP5-8.5 run?
- Line 205, Figure 3: How does this plot show the validity of the chosen approach? Even though the underlying energy balance model is able to produce a joint probability between CO2 concentrations and global mean temperature, the sampling of the 9 percentiles is only sampling a fraction of the distribution in the CO2 concentration dimension?! The sampled CO2 concentrations in PRIME span 950ppm to 1100ppm, while the energy balance model output suggests a range from 800ppm to 1200ppm?! In some regions (such as the Amazon), an 800ppm or 1200ppm CO2 concentration might well produce a different physiological plant response, yet the probabilistic PRIME model does not seem to propagate that information forward?
- Line 230, Figure 5: If SSP5-8.5 data is used to derive the PRIME patterns, it is hardly a useful comparison to show a comparison between SSP5-8.5 CMIP6 and PRIME. That is like showing training data as independent verification, which misleads about the skill of the model.
- Line 238 and vicinity: The authors just skim over the fact that when compared to other scenarios that are not used for deriving the patterns, the skill gets worse (partly understandable because of the lower signal-to-noise ratio in lower scenarios). The authors need to unpack that with quantifiable information and more details (i.e., which periods, which models were looked at, what is the difference across CMIP6 models, how is the transient skill before the end of the century, etc.).
- Line 238: When the authors write: ‘However, the high correlations and low RMSE give us confidence to apply the pattern scaling…’ I am not sure what ‘high correlations’ the authors refer to? The SSP5-8.5 ones from which the patterns were derived or the lower SSP scenario ones? If the former, then the high correlations should not give confidence to anyone that the model is skillful outside its training scope. If the latter, then the detailed plots in the supplementary are providing the usual picture that linear pattern scaling provides: That it is very good for some variables, and absolutely unusable for others. For example, Figure S8 on shortwave downward radiation shows the ‘prime’ (pardon the pun) example, where pattern scaling with global mean temperature does NOT work.
- Table 1: It would probably be useful to the reader if the RMSE values are put into the context of the mean change of the respective variables.
- Table 1: The Pearson correlation coefficients that are stated seem oddly high and I can’t reconcile them with the regionally disaggregated ones shown in Figure 4. For example, take precipitation in Figure 4… judging from the color scale, the global-mean Pearson correlation coefficient (if it were meaningful, see above) would be somewhere between 0.3 and 0.7. Yet, the results for all three SSPs show values well above 0.83 in Table 1, even 0.97 for SSP5-8.5!? Please append with much more methodological detail and/or code of what exactly was done. And please explain, why those are consistent.
- Line 313: ‘These comparisons allow us to confidently use the PRIME framework to assess impacts.. ‘ – Again, I think this is another example of slightly overconfident language that does not appropriately reflect the various limitations.
- Line 317: The authors write: “Hence it is novel to show a full probabilistic range of the possible spread of simulated carbon balance (represented by NEP)’. See above. I have my doubts, whether ‘full probabilistic’ is the right term here.. as e.g., the CO2 concentration uncertainties do not seem to be explored according to Figure 3.
- Line 359: The MESMER tool showed some improvements for surface air temperature when using additional predictors. That regional surface air temperature is the variable that is stunningly well already predicted with a pattern scaling approach. Nobody challenges the usefulness of pattern scaling for monthly mean regional temperatures. The authors seem to want to suggest here though that this marginal improvement would also be true for pattern scaling more generally. Really? I highly doubt it given the literature on scaling regional precipitation, for example, where GHG forcing, (regional) aerosol forcing have clearly been shown to not only provide marginal improvement but are vital predictors without which precipitation cannot be adequately projected (see above).
- Line 384: The authors write “Overall we have shown PRIME faithfully reproduces the climate response… “. If authors would phrase this conclusion to something like “Within the known limits of the linear pattern scaling approaches, we have shown that 3 out of the investigated 8 variables can adequately be projected in their individual monthly means’… or similar, I would have no issue with it. But conclusions like the one above are way overconfident in my view.
- I can see how much dedicated work went into this manuscript, which is why I apologize that I cannot be more positive. With major revisions, I think this manuscript can add a useful contribution to a very important field – but the current form of the manuscript requires an overhaul from multiple angles in my perception.
Citation: https://doi.org/10.5194/egusphere-2023-2932-RC2 - AC1: 'Comment on egusphere-2023-2932', Camilla Mathison, 01 Jul 2024
Data sets
FaIR: Calibration data for FaIR v1.6.2 is available from zenodo Chris Smith https://doi.org/10.5281/zenodo.6601980
ESMValTool Climate patterns code Greg Munday, Eleanor Burke, and Chris Huntingford https://zenodo.org/records/10635588
Temps and CO2 concentrations for running PRIME from FaIRv1.6.4 Camilla Mathison and Chris Smith https://zenodo.org/records/10524337
JULES output from PRIME version 1 Eleanor Burke and Camilla Mathison https://doi.org/10.5281/zenodo.10634291
Model code and software
FaIR v1.6.2 Chris Smith https://doi.org/10.5281/zenodo.4465032
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
623 | 336 | 34 | 993 | 73 | 24 | 23 |
- HTML: 623
- PDF: 336
- XML: 34
- Total: 993
- Supplement: 73
- BibTeX: 24
- EndNote: 23
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1