the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Pre-training for Deep Statistical Climate Downscaling: A case study within the Spanish National Adaptation Plan (PNACC)
Abstract. Deep Learning (DL) has recently emerged as a promising approach for statistical climate downscaling. In this study, we investigate the use of pre-training in this context, building on the DeepESD model developed for the Spanish National Adaptation Plan (PNACC), which uses ERA5 predictors and the 5 km ROCIO-IBEB national gridded predictand dataset. We evaluate the effectiveness of different fine-tuning strategies to adapt this pre-trained model to alternative regional predictand datasets, specifically a point-based station dataset. The objective is to develop downstream downscaling methods that maintain consistency with the original national-scale model while capturing the specific characteristics of regional and local datasets.
We analyze the benefits of fine-tuning in terms of faster convergence, improved generalization, and greater consistency. Using eXplainable Artificial Intelligence (XAI) techniques, we examine the relationships learned by the models and compare the resulting climate change signals. Our results demonstrate that pre-training provides a robust foundation for statistical downscaling, particularly in cases with limited spatial and/or temporal data availability (e.g., local high-resolution datasets available only for short periods), thereby reducing epistemic uncertainty and improving the reliability of future climate projections. Overall, this approach represents a step toward standardizing DL-based downscaling models to ensure more coherent and consistent climate projections across national and regional scales.
- Preprint
(4913 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 13 Nov 2025)
-
CEC1: 'Comment on egusphere-2025-3754 - No compliance with the policy of the journal', Juan Antonio Añel, 11 Oct 2025
reply
-
AC1: 'Reply on CEC1', José González-Abad, 13 Oct 2025
reply
Dear Juan A. Añel,
We apologize for the oversight. All data previously linked through the AEMET URL, corresponding to the predictands used to train the model, have now been made available via the following Zenodo repository: https://zenodo.org/records/17338349 (DOI: 10.5281/zenodo.17338349). We commit to updating the corresponding section to include this modification in the next review of the manuscript.
Best,
Jose
Citation: https://doi.org/10.5194/egusphere-2025-3754-AC1 -
CEC2: 'Reply on AC1', Juan Antonio Añel, 13 Oct 2025
reply
Dear authors,
Many thanks for the quick reply and for solving the pending issues regarding the data.
We can consider now the current version of your manuscript in compliance with the policy of the journal.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-3754-CEC2
-
CEC2: 'Reply on AC1', Juan Antonio Añel, 13 Oct 2025
reply
-
AC1: 'Reply on CEC1', José González-Abad, 13 Oct 2025
reply
-
RC1: 'Comment on egusphere-2025-3754', Anonymous Referee #1, 23 Oct 2025
reply
This study presents an important exploration of using pre-trained deep learning (DL) models for climate downscaling, aiming to maintain physical consistency between large-scale predictors and localized datasets. By systematically testing multiple training strategies (pre-training, partial fine-tuning, full fine-tuning, and full training), the authors demonstrate the robustness and efficiency of applying pre-trained models on the station-based dataset. However, as the authors note in the discussion, “this benefit does not necessarily translate into improved accuracy on STATIONS-IBEB, likely due to the presence of higher and more localized extreme values, which are more challenging to model than their smoothed counterparts in the interpolated ROCIO-IBEB gridded dataset.” This observation raises a critical issue: while pre-training improves efficiency and generalization, it may limit the model’s ability to capture localized extremes that define station-based observations. Clarifying this trade-off would deepen the study’s insight into how pre-trained DL models balance physical consistency and predictive reliability in downscaling applications. The following comments aim to clarify and deepen several aspects of this discussion.
- The key distinction between ROCIO-IBEB and STATION-IBEB lies in their treatment of local extremes. Since station-based datasets inherently preserve localized weather phenomena, it would be helpful to elaborate on the rationale for using STATION-IBEB as the downscaling target and to contrast its statistical characteristics—particularly the distribution tails representing extreme events—with those of ROCIO-IBEB. This clarification would highlight the physical implications of transferring knowledge between datasets with distinct spatial and statistical properties.
- Although fine-tuned models converge faster and achieve comparable performance to fully-trained models in terms of RMSE and mean bias, Figure 4 (right column) suggests that fully-trained models perform slightly better for extreme metrics such as TXx and TNn. This raises an important question about the ability of pre-trained models to represent localized extremes, which are critical for reliable high-impact weather downscaling. A focused evaluation of model skill over the extreme subsets of both ROCIO-IBEB and STATION-IBEB would help determine whether performance limitations stem from the coarse representation of extremes in the pre-training data or from the fine-tuning process itself, which may not fully adapt to station-scale variability.
- The aggregated saliency map results reveal differences between full-training and pre-trained models, yet it is unclear whether these reflect meaningful large-scale dependencies capable of inferring local extremes or potential overfitting to dominant features. Providing examples of regional or event-specific saliency maps, rather than only aggregated values, would clarify whether the learned features correspond to physically interpretable meteorological patterns or spurious correlations introduced during training.
- In Section 4.4, fine-tuned models trained with datasets containing varying fractions of missing data show lower RMSE values, attributed to pre-learned representations improving generalization. However, such pre-training could potentially smooth out localized extremes in unseen data. Evaluating performance specifically under extreme conditions in these incomplete datasets would strengthen the interpretation of how pre-training affects robustness and physical fidelity when data coverage is limited.
Overall, this study makes a valuable contribution to understanding how pre-trained DL models can be adapted for regional climate applications. Further analysis focusing on extreme events and saliency-based interpretability would enhance confidence in the approach and clarify the trade-offs between maintaining physical consistency and capturing localized, high-impact weather phenomena.
Citation: https://doi.org/10.5194/egusphere-2025-3754-RC1 -
RC2: 'Comment on egusphere-2025-3754', Anonymous Referee #2, 24 Oct 2025
reply
In this study, the authors aim to explore the benefits of transfer learning in the context of ML-based statistical downscaling for climate applications. Specifically, the goal is to understand the benefits of pre-training the latent representations of ML-based downscaling models on a core dataset, to achieve improved skill or higher consistency when the downscaling model is fine-tuned for a different downscaling task.
This topic is certainly of practical and scientific interest for climate modelers, and a good fit for the journal. However, I find the implementation of this study to be largely uninformative regarding the questions posed by the authors in the abstract and introduction. In my view, this is due to the choices made by the authors for the pre-training and fine-tuning datasets, as well as the forms of fine-tuning explored for the DeepESD model. For these reasons, which I detail below, I find this manuscript in its current form unsuitable for publication. I encourage the authors to find a more effective lens through which the questions that they set out to study can be answered.
Major comments:
- Transfer learning is a well-established way to fine-tune large ML models, pre-trained on an extensive pre-existing dataset, on a smaller dataset that is more representative of some final task. The manuscript instead explores pre-training roughly 17,000 parameters of the final DeepESD models, out of a total of ~4.4M parameters for temperature and ~7.5M parameters for precipitation, respectively. This can hardly be called fine-tuning, when the pre-trained parameters represent less than 1% of the total number of parameters in both cases. This partly explains why all the variants yield statistically equivalent results (Figures 4-7), and why no conclusions can be drawn from this experimental setup.
- The final target dataset of interest, STATIONS-IBEB, is used to construct the pre-training dataset, ROCIO-IBEB. This setup omits the most important practical aspect of transfer learning: can we train on one dataset to improve predictive skill on a different dataset? This is an important question for some of the applications cited by the authors: pre-train on a national-level dataset, and fine-tune on a local dataset with different statistics (Taboada et al, 2024). The current setup is too idealized and not representative of the situations where transfer learning may actually be useful, in my opinion.
- Figure 3 shows that pre-training does not yield improved results, only faster training for a relatively inexpensive model where training cost is not really an issue. The rest of the results also fail to show any positive effects of pre-training. I think this is because at the level of the high-level representations of the data learned by the convolutional layers, the datasets ROCIO-IBEB and STATIONS-IBEB are largely indistinguishable (since they share the same data sources). This leads me to believe that the improved skill of fine-tuned models when the STATIONS-IBEB dataset is artificially shrunk (Fig 9) is due to the fact that you are actually showing a very similar version of the omitted samples to the model through ROCIO-IBEB. This is another example of why useful conclusions cannot be drawn given the similarity of the two datasets considered.
Minor comments:
- L38-40: "Diverging outcomes, which may confuse users". Improving consistency at the expense of capturing the true uncertainty of regional climate projections is actually a disservice to the downstream users, because it leads to biased estimates of risk.
- L82: Calling a 1km resolution downscaled dataset spanning thousands of years and supported by extremely sparse observations is a stretch (Karger et al, 2023).
- L123: "the most widely used in the downscaling literature". I don't think this method is that well established (60 studies reference it), so this needs to be toned down. There are studies from 2024 on downscaling with more references (e.g., CorrDiff), and deterministic ML-based downscaling models are not representative of the state of the art anymore.
- The version of DeepESD used in this paper is different than the one introduced by Bano-Medina et al (2022) for temperature. The MSE loss assumes a homogeneous uncertainty estimate, unlike the original Gaussian log-likelihood where the variance is explicitly modeled. I would also say that the deterministic MSE is no longer a "widely adopted" loss in downscaling due to its tendency to smooth fields in space and underestimate extremes.
- Fig 2: The legend reads "full-tuning" for the right column It should be full fine-tuning.
- Fig 3: Is this the training loss or the validation loss? If the former, please change to the validation loss, which is more representative of operational skill. Otherwise, please show both.
- L212: "take about half the number of epochs": Certainly not to reach the best final skill, since the fully trained model is better. How are you defining a common final time for all models to assert this?
- The results shown in Figures 4 and 5 for the ROCIO-IBEB dataset are not comparable to those in the STATIONS-IBEB dataset, since the former is a smoothed interpolation of the latter. Errors on the ROCIO-IBEB dataset will always be lower.
- FIgure 9: I think the legend should refer to different versions of STATIONS-IBEB, not ROCIO-IBEB.
- Discussion: I do not know where the study demonstrates "the potential of pre-training" (L303), or the affirmation that "fine-tuning the extractor appears to be beneficial". Beyond Figure 9, which has some issues I raised before, the other results are largely equivalent for all variants.
Citation: https://doi.org/10.5194/egusphere-2025-3754-RC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 925 | 79 | 23 | 1,027 | 15 | 13 |
- HTML: 925
- PDF: 79
- XML: 23
- Total: 1,027
- BibTeX: 15
- EndNote: 13
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
To access part of the data used in your work, you have linked web pages from AEMET. However, these are not suitable repositories for scientific publication, and we can not accept them. Therefore, the current situation with your manuscript is irregular. Please, publish your data in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible, as we can not accept manuscripts in Discussions that do not comply with our policy.
Also, you must include a modified 'Code and Data Availability' section in a potentially reviewed manuscript, containing the information of the new repositories.
I must note that if you do not fix this problem, we cannot accept your manuscript for publication in our journal.
Juan A. Añel
Geosci. Model Dev. Executive Editor