Structural warming biases distort extreme rainfall intensification estimates in event attribution
Abstract. Extreme event attribution (EEA) is becoming an increasingly important component of climate change risk assessment and communication. While most EEA methods rely on numerical models, the extent to which model fidelity in representing anthropogenic warming shapes attribution outcomes remains underexplored. Here we identify global-scale biases in leading CMIP6 climate models relative to reanalysis data and show that these biases directly propagate into EEA results. CMIP6 models reproduce the integrated magnitude of anthropogenic warming but systematically distort its three-dimensional structure, underestimating lower-tropospheric warming over land—thus dampening land–sea thermal contrasts—while overestimating upper-tropospheric warming, particularly in the Northern Hemisphere. Consequently, in a storyline-based testbed experiment for the October 2024 Valencia storm (Spain), the response in extreme rainfall rises from ~10 % under CMIP6-derived warming to ~30 % under an observationally constrained signal. This enhanced response is driven by increased low-level moistening, larger convective instability, and strengthened upper-level winds that push precipitation well beyond Clausius–Clapeyron scaling. We also show similar structural mismatches across multiple Northern Hemisphere mid-latitude locations, suggesting that this underestimation is not event-specific. Our results underscore the need to strengthen confidence in attribution methods and provide a robust pathway for constructing observationally constrained counterfactual climates.
The manuscript by Insua-Costa et al. focuses on how biases in temperature trends extrapolated from CMIP6 models might lead to the development of incorrect counterfactuals for model-based extreme event attribution studies, taking the Valencia floods of 2024 as a relevant example. The focus is particularly posed on the vertical structure of warming, capable of changing the background stability profile when superimposed to simulations of the event under analysis e.g. in the context of pseudo-global warming approaches. Employing an array of different counterfactual simulations, the authors explain how differences in the dynamical and thermodynamical setup of the counterfactual simulation can substantially affect precipitation totals and, consequently, the attribution outcome.
While the thermodynamic arguments provided (increased CAPE, increased IWV with global warming) are consistent with the increase in precipitation expected in a warmer climate at a basic level, the dynamical argument based on thermal wind is very likely flawed. Furthermore, with the exception of the changes in stability in top-heavy vs bottom-heavy warming patterns, there is little physical understanding provided to help the reader connect changes in the vertical warming structure and precipitation outcomes. Given the convective nature of the event and the high degree of interaction with orography, small changes in the circulation can result in very large precipitation differences, with maxima of precipitation decaying or shifting away from the region of interest: this implies that the interpretation of precipitation changes between simulations should be done with great care, taking into account the variability within the ensemble.
The null hypothesis that what has been observed in the Valencia floods was specific to the event, to the simulations, or to the region, and to reasons other than the vertical structure of the warming, is not excluded convincingly. The generalization attempt is, therefore, not supported enough by the result shown. Given that I feel my comments challenge some very central points of the paper, I have to recommend very major revisions or rejection in its current form.
Major comments
1) I believe that the explanation of dynamical changes provided by the authors is not consistent with the basic synoptic meteorology concepts behind thermal wind balance. The explanation provided by the authors for the difference in upper-level flow strength involves changes in the land-sea thermal contrast that would propagate upward and affect free tropospheric winds at 500hPa (Fig. 5f). Thermal wind balance descends directly from the assumptions of geostrophic and hydrostatic balance, and those assumptions are not satisfied at the typical scales of land-sea thermal contrasts. Such temperature gradients originate other types of mesoscale circulation, such as sea breezes, that are non-geostrophic and limited to the height of the boundary layer with no effect on the upper troposphere. Otherwise, we would have a NE/SW oriented jet stream along the western coast of the Mediterranean Sea forming virtually every afternoon during summer as solar heating warms the surface and the ocean stays relatively cool... which is not the case.
2) Knowing the temperature gradient at one layer should not be used to predict the wind several thousand of meters above it, especially in a region with complex orography (rather than in the simplified set up of idealized jet streams over oceans, in which thermal wind is usually introduced during introductory meteorology classes). If the authors would like to continue with the thermal wind argument, they should at least compute the geostrophic and thermal wind and produce actual evidence that specific differences in low-level thermal wind are the dominant contributors to the 10-15% difference in 500hPa wind speed depicted in Fig. 5f, which then might impact vertical shear and convection organisation. This would result in a more mechanistic understanding of why changes in vertical warming structure should affect attribution results.
3) Lacking a clear explanation of the connection between the vertical warming pattern and the wind, one might come to the simpler hypothesis that weaker upper-level winds over the Valencia region are simply due to a lateral shift of the region with the maximum winds. In Fig. 4 the ERA5 counterfactual features a stronger sea-level pressure gradient with a deeper minimum east of Gibraltar, that enhances easterly winds over the southern portion of Spain (Fig. 4c): a similar shift might be present also at 500hPa, and would be consistent with the enhanced precipitation in the Murcia/Almería region visible in Fig. 4f. Please provide 500hPa wind difference maps between simulations to actually show the spatial pattern of wind difference.
4) The explanation of the results is very difficult to understand and at times contradictory. Let us take the sentences at lines 271-274 and the 500hPa wind speed as example, but the lack of clarity is widespread across Section 3.3 and 3.4 and in the Conclusions. Here, we know that the ERA5 counterfactual features an increased land-sea contrast and this would correspond (by the authors' hypothesis) to a weakening of upper-level winds in the counterfactual simulation with respect to the factual (I guess because the thermal wind vector would be roughly aligned with the coastline and point south-westward if land is warmer than the sea, opposing the southerly flow driven by the DANA south of Gibraltar we see in Fig. 4). The C<F is consistent with (F-C)/F > 0, as shown in Fig. 5f. The problem is in the wording used: when the authors write "the ERA5-based perturbations yield a significant strengthening of upper-level wind" (line 272) the reader might be confused, because in the ERA5-based counterfactual the upper-level winds are actually weaker than in the factual. This confusion becomes a contradiction in the conclusions, when it is said "The ERA5-based experiment yields enhanced low-level moistening, stronger convective instability and a more vigorous upper-level jet, producing a super-Clausius–Clapeyron precipitation response". Please revise the explanations to ensure consistency and clarity.
5) The generalization beyond Valencia presented in Section 4.3 is also not convincing, in my opinion. What is shown in Fig. 8 is simply that CMIP6 models, in most cities, tend to feature a upper-level warming bias with respect to the estimate derived by ERA5: such a result is consistent with the bias described by Keil et al. (2021, https://doi.org/10.1175/JCLI-D-21-0196.1) and mentioned in the introduction. However, this does not automatically imply that the precipitation response in London or Los Angeles will follow the same one as Valencia, when a similar approach is applied to study extreme events over those cities. The Valencia event was convectively dominated in the context of a DANA, while extreme precipitation in other parts of the world might be more stratiform and associated with other mechanisms (e.g., atmospheric rivers, cyclones). More physics-based understanding would support better the generalization attempt, also allow to hypothesized for which climatic region the proposed mechanism is expected to be more (or less) relevant.
Minor points
Section 3.1: The global-scale discussion presented in this section is not specifically relevant to the western Mediterranean case study and to the conclusions, and is in this sense a deviation with respect to the point of the paper. Consider removing or drastically shortening, by placing the changes in the western Mediterranean in the context of the North Atlantic/European/Mediterranean region only.
Line 144-145: the role of the lapse rate feedback in inducing Arctic Amplification is being currently reconsidered critically, consider rephrasing this sentence (see Caballero, R., and T. M. Merlis, 2025: Polar Feedbacks in Clear-Sky Radiative–Advective Equilibrium from an Airmass Transformation Perspective. J. Climate, 38, 3399–3416, https://doi.org/10.1175/JCLI-D-24-0031.1.)
Line 171: the 700hPa level is not usually considered "mid-tropospheric", as it still bears substantial influences from the surface (e.g., see the North Atlantic "warming hole"). What about 600 or 500hPa?
Line 193: it looks to me more vertically homogeneous with a relative maximum around 850hPa, while the magnitude of the warming trend near the surface does not really stand out. The CMIP6 trend instead is clearly top-heavy.
Fig. 4: Accumulating precipitation in presence of very extreme peaks might lead to interpretation problems, as changes in single precipitation maxima should not be seen as representative of the behaviour of large area averages. The color scale of precipitation does not increase monotonically, further complicating the interpretation. Please provide spatial precipitation difference maps between simulations to actually show that there is a systematic reduction of precipitation and not a shift or just a loss individual maxima.