the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Constraining uncertainty in projected precipitation over land with causal discovery
Abstract. Accurately projecting future precipitation patterns over land is crucial for understanding climate change and developing effective mitigation and adaptation strategies. However, projections of precipitation changes in state-of-the-art climate models still exhibit considerable uncertainty, in particular over vulnerable and populated land areas. This study aims to address this challenge by introducing a novel methodology for constraining climate model precipitation projections with causal discovery. Our approach involves a multistep procedure that integrates dimension reduction, causal network estimation, causal network evaluation, and a causal weighting scheme which is based on the historical performance (the distance of the causal network of a model to the causal network of a reanalysis dataset) and the interdependence of CMIP6 models (the distance of the causal network of a model to the causal network of other climate models). To uncover the significant causal pathways crucial for understanding dynamical interactions in the climate models and reanalysis datasets, we estimate the time-lagged causal relationships using the PCMCI causal discovery algorithm. In the last step, a novel causal weighting scheme is introduced, assigning weights based on the performance and interdependence of the CMIP6 models' causal networks. For the end-of-century period 2081–2100, our method reduces the very likely ranges (5–95 percentile) of projected precipitation changes over land between 10 and 16 % relative to the unweighted ranges across three global warming scenarios (SSP2-4.5, SSP3-7.0 and SSP5-8.5). The sizes of the likely ranges (17–83 percentile) are further reduced between 16 and 41 %. This methodology is not limited to precipitation over land and can be applied to other climate variables, supporting better mitigation and adaptation strategies to tackle climate change.
- Preprint
(6484 KB) - Metadata XML
-
Supplement
(0.96 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-2656', Anonymous Referee #1, 18 Oct 2024
This manuscript introduces a method to constrain precipitation projections of climate models based on causal discovery, specifically casual associations between global sea level pressures. The study compares several CMIP6 historical simulations to NCEP/NCAR and ERA5 reanalysis datasets in terms of how lagged causal dependencies are reproduced for each model. Each model is attributed an associated score and weighting for an ensemble average, and projected changes in precipitation are compared across models with different scores. The authors show that while mean changes in precipitation do not change with weighting, the range of precipitation changes is constrained when casual dependencies are considered, and there is significant spatial variability.
This is an interesting paper that shows an example of a casually-based emergent constraint on climate projections, and illustrates a non-linear type of relationships between model score (how well the model matches observed causal dependencies) and projected changes in precipitation. However, there are some places where clarifications could be made and some overstatements regarding the extent to which model scores related to projected changes in precipitation. These are outlined in the comments below, and consist of minor to moderate revisions.
Section 2.2 on Data Preprocessing: It is stated that the data are detrended and anomalized by subtracting the climatological monthly mean and dividing by monthly variance, and also separating by the seasons. This type of method is needed to make the time-series more stationary. However it seems like dividing by a monthly mean (as opposed to a moving average) would not necessarily de-trend the daily data. For example, for a month in which SLP is increasing through the months for June and July, dividing by the means would retain this trend within each month, and could also cause a non-physical step change at the end of June and beginning of July, is that correct? Some detail or basis for this method might be useful.
In section 2.2, it would also be helpful if the “different causal dependencies are expected for each meteorological season” aspect could be explained, since some of the trends associated with seasonal changes are already being mitigated due to the detrending and anomalizing. Later in the paper, there is not much mention of seasonal differences besides Figures A1 and A2, only global and spatial differences, so it seems like this aspect is largely dropped.
Figure 1: some of the font is very tiny in this figure. It would be better to omit the text since these images are mainly for illustrative purposes.
Results: The first two results sections, 3.1 and 3.2, focus on figures that are presented in the appendix. I would recommend combining/condensing these sections and getting to the “main results” regarding precipitation earlier. This is an editorial opinion, but it would be better to focus on the main text figures more in the main text, and place appendix figure references parenthetically to support particular statements. As it is, I felt like I was always moving back and forth from the appendix to the text and would lose the thread of the paper.
Figure B1 and B2 seem to have the same caption, although I see some of the links are different between the figure – from the text in Section 3.2, it seems like one of these should not be ERA5, but NCEP/NCAR? In general, it seems like it would be better to combine Figures B1 and B2 into one figure with different colored arrows, since otherwise it is hard to go back and forth and see what the differences are.
Similarly, it would be easier to read Figure 2 if it were combined into a single bar chart with 2 colors for each comparison case (or hatching of bars, etc), which would allow a more specific comparison of F1 score for each model, and would reduce labeling.
Line 298: “models with similar causal networks have high similarities” – isn’t the Figure 3 based on causal networks? In other words, isn’t this statement redundant (similar models are similar)? This could be made more clear.
Line 309 and Figure 4: Regarding the parabolic relationships, they seem to be “statistically significant” based on the p-value, but I would not say they are “strong”, at least not visually from Figure 4 or without given some other metric of goodness of fit. In Figure 4, the blue lines seem distracting (and statistically non-significant), and it would be better to show a confidence interval around the parabolic fit. A non-statistically significant fit could also be indicated with a dotted line to better highlight that aspect. Another statistical test could be a hypothesis test along the lines that the range of delta_Precip values is smaller above a certain threshold of F1 scores. In other words, maybe a stronger statistical case could be made that omitting the lowest F1 scores results in a tighter and more moderate range of precipitation values.
Line 320: x-axis of what? (Figure 4). Here is another reference to a “distinct peak” which seems a little tenuous given the scatter in Figure 4 for some of the cases.
Line 325: previous findings in this study (Figures 1-4) or in other climate model studies?
Citation: https://doi.org/10.5194/egusphere-2024-2656-RC1 -
AC1: 'Reply on RC1', Kevin Debeire, 15 Nov 2024
General response:
We thank the reviewer for finding our work interesting. We acknowledge the need for improved clarity and have addressed these areas to avoid overstatements. Your comments regarding the need for more detailed explanations (e.g., data preprocessing, seasonal differences), suggestions to improve the clarity of figures, and recommendations to streamline the presentation of results have helped us better communicate the core findings of this work. We have carefully addressed each of your specific comments and made the corresponding revisions to the manuscript.
Comment 1 (Section 2.2 preprocessing):
Thank you for highlighting this. We will revise Section 2.2 to include a detailed explanation of our detrending and anomalization, specifying that the pre-processing includes an initial linear detrending on a grid-cell basis, followed by anomaly calculation using a long-term daily climatology (correcting the original typo regarding monthly means). The anomalies are calculated by subtracting each day's mean and standard deviation. Those steps are essential because, while SLP data is largely stationary even under historical forcing (Nowack et al. 2020), it is necessary to remove any small trends to ensure robust causal discovery. The causal discovery method relies on the assumption of stationarity to accurately identify dependencies, making this pre-processing a prudent step.
Comment 2 (Section 2.2 seasonality):
The motivation for analyzing each season separately (DJF, MAM, JJA, SON) was to examine seasonal variability in causal dependencies. We acknowledge that detrending and anomalizing remove some seasonal trends; however, the seasonal differences in climate dynamics are still substantial. For example, the Principal Component Analysis (PCA) with Varimax rotation reveals that the components shift significantly between seasons (see Fig. A1 and A2). Consequently, the causal networks of the components also change substantially between seasons, further justifying a seasonally separate approach. In the paper, we found consistent results across all seasons in terms of detecting model similarities and performance, which supported the use of annual F1-scores averaged over seasons in the main findings (see for example Fig. 2).
However, to better address this point, we have discussed these results more explicitly in the text. Additionally, we have included supplementary figures (similar to Fig. 2 but for seasonal F1-score) to illustrate the similarity of F1-scores and model rankings across seasons, providing further support for our approach.
Comment 3 (Figure 1):
Thank you for noting the font size in Figure 1. We agree that the small text may be difficult to read. To address this, we have removed all tiny fonts from the figure, focusing solely on its illustrative purpose. This simplification improves clarity while maintaining the figure's intended role.
Comment 4 (Results Sections 3.1 and 3.2):
Thank you for your suggestion. We agree that the result sections for Step 1 and Step 2 could be shortened. We have combined section 3.1 and 3.2 and moved the detailed discussion of the appendix figures (A1, A2, B1, B2, C1 and C2) to the appendix itself, keeping the main text more concise and focused. Following these changes, the "main results" appear earlier in the main text.
Comment 5 (Figures B1 and B2):
Thank you for your comment. The caption for Fig. B1 incorrectly listed ERA5 instead of NCEP/NCAR, and we have corrected this. However, we could not combine Figures B1 and B2 into a single figure with overlapping arrows because the modes and their positions on the map differ between the two figures. Instead, we have regrouped the figures into a single figure with two subfigures (B1 and B2). Additionally, to better highlight the differences, we have changed the color of the causal dependencies in Figure B2 (ERA5) to red arrows, making it easier to understand that the datasets differ between the subfigures.
Comment 6 (Figure 2):
We agree that consolidating Figure 2 into a single bar chart with distinct colors would improve readability. We have implemented this suggestion.
Comment 7 (Line 298):
The phrase is indeed redundant. We have rephrased it to: "Models with shared developmental features exhibit higher causal network similarity, likely due to their comparable dynamical representations".
Comment 8 (Line 309 and Figure 4):
We agree that the term "strong" may overstate the relationship. We have adjusted the language in the manuscript to reflect the statistical significance without overstating the strength, rephrasing it to indicate "an approximately parabolic relationship over the space of opportunities covered by CMIP6 models."
To improve interpretation, we have included confidence intervals around the parabolic fit in Figure 4 and adjusted the non-significant blue line to a dotted line.
Comment 9 (Line 320):
We have specified in the text that we refer to the x-axis in Figure 4 and use "peak" instead of "distinct peak" (line 322).
Comment 10 (Line 325):
"Previous findings" here refer specifically to results within this study. We have clarified this in the text to prevent misinterpretation.
Kind regards,
The authors.
Reference:
Nowack, P., Runge, J., Eyring, V., & Haigh, J. D. (2020). Causal Networks for Climate Model Evaluation and Constrained Projections. Nature Communications, 11, 1415. https://doi.org/10.1038/s41467-020-15195-y .
Citation: https://doi.org/10.5194/egusphere-2024-2656-AC1
-
AC1: 'Reply on RC1', Kevin Debeire, 15 Nov 2024
-
RC2: 'Comment on egusphere-2024-2656', Anonymous Referee #2, 20 Oct 2024
In Section 2.2, Lines 145-150, it is stated that "the daily data is detrended and anomalized by subtracting the climatological monthly mean and dividing by the monthly variance, which reduces seasonality in the data." From this description, it appears that the sea-level pressure (SLP) timeseries is detrended. However, it should be clarified how exactly this detrending is applied. If the goal is to weight models based on the performance of forced response relative to observation, removing the long-term trend associated with the forced response could affect the metric used to assess model performance. Could the authors clarify this point explicitly or provide additional commentary on the approach?
In Section 3.5, a thorough evaluation of the robustness of the weighting method is needed to avoid overconfident constrained projections. Several key aspects require assessment, such as 1. whether SLP alone can adequately capture the model differences in future precipitation changes. Even if the physical connection between SLP metrics and precipitation changes is clear, other factors may contribute to model differences in precipitation, which could lead to overly confident projections if based solely on SLP. 2. The robustness of observationally constrained projections derived from the causal discovery method needs to be carefully evaluated. A leave-one-model-out test would be a valuable approach to assess potential overconfidence in the analysis and ensure the reliability of the results (Ribes et al 2021; Brunner et al 2020)
The influence of internal variability should be carefully considered in this analysis. Given that the observed data during the selected period may be partly influenced by internal variability, this could introduce noise in the comparison between model simulations and observations. If not sufficiently addressed, this variability might bias the weighting applied to the models, leading to less reliable constrained projections. It would be helpful if the authors to clarify how internal variability is accounted for in the weighting process and whether measures have been taken to minimize its impact.
Reference
Ribes, A., Qasmi, S., & Gillett, N. P. (2021). Making climate projections conditional on historical observations. Science Advances, 7(4), eabc0671.
Brunner, L., Pendergrass, A. G., Lehner, F., Merrifield, A. L., Lorenz, R., & Knutti, R. (2020). Reduced global warming from CMIP6 projections when weighting models by performance and independence. Earth System Dynamics, 11(4), 995-1012.
Citation: https://doi.org/10.5194/egusphere-2024-2656-RC2 -
AC2: 'Reply on RC2', Kevin Debeire, 15 Nov 2024
General response:
We thank the reviewer for their thoughtful and constructive comments, which have helped us evaluate and enhance the robustness and clarity of our study. Your feedback, particularly regarding the robustness of our weighting method, the role of internal variability, and the role of additional variables to constrain precipitation, has led to significant improvements in the manuscript. Below, we respond to each of your individual comments.
Comment 1 (Section 2.2 preprocessing):
We have extended the explanation of the pre-processing section to clarify the detrending and anomalizing approach. We have revised Section 2.2 to clarify that the pre-processing includes an initial linear detrending on a grid-cell basis, followed by anomaly calculation using a long-term daily climatology (correcting the original typo regarding monthly means). The anomalies are calculated by subtracting each day's mean and dividing by the standard deviation. Those steps are essential because, while SLP data is largely stationary even under historical forcing (e.g., Nowack et al. 2020), it is necessary to remove any small trends to ensure robust causal discovery. The causal discovery method relies on the assumption of stationarity to accurately identify dependencies, making this preprocessing a prudent step.
Comment 2 (Section 3.5):
We are not entirely certain if we fully understood the reviewer’s suggestion. Do you mean a leave-one-model-out approach to test whether the incorporation of an additional variable in the weighting could lead to an improvement? Alternatively, could the suggestion involve evaluating the causal networks against networks derived from one of the other models treated as a "ground truth," and assessing whether constrained projections move closer to that model?
In our study, to avoid overconfidence in weighting, a “perfect model” calibration approach was used, specifically tuning the performance shape parameter. This method evaluates robustness by treating each model iteratively as “truth,” ensuring weighted projections are not overconfident.
We acknowledge the limitation of using only SLP for the causal weighting scheme and mention future research plans to incorporate additional diagnostics such as temperature, following precedents in emergent constraints on global precipitation (e.g., Dai et al., Shiogama et al.). This multi-diagnostic approach could improve precipitation projections further by addressing model differences more comprehensively. In addition, we acknowledge that the dynamical aspects covered by the causal networks constrain only part of the drivers of precipitation. However, this focus on a dynamical quantity such as SLP was intentional, as incorporating thermodynamic quantities like surface temperature would make the causal networks far less stationary. Building multivariate causal networks that include thermodynamic variables would require significant methodological adjustments and is beyond the scope of this study. We recognize this as an important direction for future research.
Comment 3 (Internal variability):
We appreciate the reviewer highlighting the critical role of internal variability in our analysis. We recognize that both the observational data and the climate model simulations contain internal variability, which can introduce noise and potentially bias the comparison between models and observations. To mitigate its influence, multiple ensemble members for each model were processed, with causal networks derived independently for each. The final F1-scores represent an ensemble average, which reduces the variability effects by smoothing out member-specific results. Recognizing that reanalysis datasets themselves are subject to internal variability and measurement uncertainties, we have analyzed multiple reanalysis products (ERA5 and NCEP/NCAR). Although not eliminating internal variability completely, this approach provides a more robust comparison by reducing noise from individual ensemble members. We have added this clarification on how we account for internal variability in the main text of the paper.
Additionally, we note that internal variability itself offers opportunities to learn about the robustness of our method. Specifically, we have found differences between the causal networks of the models, which were shown to be larger than the differences between the causal networks across ensemble members of individual models. This supports the idea that the differences we capture are meaningful and not purely due to internal variability. This finding aligns with results from previous work (Nowack et al., 2020), where this was demonstrated clearly.
Kind regards,
The authors.
Reference:
Dai, P., Nie, J., Yu, Y., & Wu, R. (2024). Constraints on Regional Projections of Mean and Extreme Precipitation under Warming. Proceedings of the National Academy of Sciences, 121, e2312400121. https://doi.org/10.1073/pnas.2312400121
Nowack, P., Runge, J., Eyring, V., & Haigh, J. D. (2020). Causal Networks for Climate Model Evaluation and Constrained Projections. Nature Communications, 11, 1415. https://doi.org/10.1038/s41467-020-15195-y .
Shiogama, H., Watanabe, M., Kim, H., & Hirota, N. (2022). Emergent Constraints on Future Precipitation Changes. Nature, 602, 612–616. https://doi.org/10.1038/s41586-021-04310-8
Citation: https://doi.org/10.5194/egusphere-2024-2656-AC2
-
AC2: 'Reply on RC2', Kevin Debeire, 15 Nov 2024
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
177 | 33 | 33 | 243 | 18 | 9 | 3 |
- HTML: 177
- PDF: 33
- XML: 33
- Total: 243
- Supplement: 18
- BibTeX: 9
- EndNote: 3
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1