the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
SDMBCv2 (v1.0): correcting systematic biases in RCM inputs for future projection
Abstract. Regional Climate Models (RCMs) offer enhanced spatial resolution and a more realistic depiction of local climate processes. However, they often inherit systematic biases from their driving Global Climate Models (GCMs), which can compromise the accuracy of downscaled climate projections. To address this, bias correction techniques have been widely employed to adjust GCM and RCM outputs, particularly for climate impact and adaptation studies. Traditional methods, however, typically correct surface variables independently and lack physical and dynamical consistency. Bias correcting GCM boundary conditions prior to RCM simulation ensures a more coherent, physically and dynamically consistent, regional climate simulation with reduced errors. This study evaluates the effectiveness of such an approach using a calibration/validation framework, demonstrating significant error reduction during the validation (out-of-sample) period compared to uncorrected GCM data. We present an updated version of the open-source Python package, Sub-Daily Multivariate Bias Correction (SDMBC) v2, designed to correct RCM input variables using both reanalysis and raw GCM datasets. Enhancements include support for future climate projections, flexible horizontal and vertical interpolation for compatibility with diverse datasets, and a fully Python-based architecture optimized for parallel processing and high-performance computing. This paper illustrates the software's capabilities and provides a practical application example.
- Preprint
(4728 KB) - Metadata XML
-
Supplement
(5216 KB) - BibTeX
- EndNote
Status: open (until 18 Apr 2026)
- RC1: 'Comment on egusphere-2025-6411', Anonymous Referee #1, 29 Jan 2026 reply
-
CEC1: 'Comment on egusphere-2025-6411', Juan Antonio Añel, 04 Feb 2026
reply
Dear authors,
I would like to note that in the "Code and Data Availability" section of your manuscript, to obtain the SDMBCv2 code, you point out the reader to a GitHub website. GitHub websites are not acceptable to store assets in scientific publication, an GitHub itself instructs users to use long-term repositories for it, instead of citing GitHub sites. Fortunately, in the internal records for the submission of your work, you have provided an acceptable long-term repository, in this case hosted by Zenodo, namely https://zenodo.org/records/17707370. In this regard, I have to request you that if the Topical Editor of your manuscript request you additional reviews or decides to accept it for publication, in any reviewed version you must include the link to the Zenodo repository, and not the one to GitHub.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-6411-CEC1 -
RC2: 'Comment on egusphere-2025-6411', Anonymous Referee #2, 18 Mar 2026
reply
The manuscript introduces SDMBCv2, an updated Python-based package designed for the bias correction of GCMs. The transition from a Fortran-based legacy script to a parallelized Python architecture is timely and useful. The authors demonstrate the tool's efficacy across multiple timescales and variables, showing significant improvements in capturing the mean state and inter-variable correlations. While the validation framework is robust, using a long (31-year) out-of-sample period, the manuscript requires further clarification of the technical justifications for physical consistency, potential overfitting, and the preservation of climate change signals. I recommend it for publication after Minor Revisions.
Specific Comments:Â
- Overfitting and Generalizability. As shown in Table 1, there is a striking discrepancy in the Kolmogorov-Smirnov (K-S) test pass rates between the calibration period (100% for all variables) and the validation period (dropping to ~60% for wind speed and ~80% for temperature). A 100% pass rate in calibration often indicates that the statistical mapping is "tightly fitted" to the specific noise of the training period. The significant drop in the independent validation period (1990–2020) suggests a degree of overfitting. The authors should discuss whether the complexity of the nesting framework or the quantile mapping frequency contributes to this drop. Please also revise the language in Section 4 (e.g., Line 456), as "consistently high rates" is somewhat misleading given the ~40% failure rate for wind speed in validation.
- Preservation of Climate Change Signals (Signal Smearing). A critical concern in bias correction is whether the process artificially modifies the GCM’s intrinsic climate change signal (e.g., the warming trend or intensification of the hydrological cycle). Since SDMBC v2 applies correction factors derived from a historical period (1959–1989) to a future or more recent period (1990–2020), there is a risk of "signal smearing." If the GCM predicts a legitimate climatic shift that wasn't present in the historical observations, the bias correction might erroneously treat this shift as a "bias" and remove it. Please clarify if the methodology is trend-preserving.Â
- How does the quantile mapping algorithm handle values in the validation or future periods that fall outside the range of the calibration period? Does it use linear extrapolation, constant shifting, or another method? The treatment of 'new' extremes is a well-known source of uncertainty in bias correction that should be explicitly documented.
- Discussion on Low-Variability Regions (e.g., Africa). The authors’ explanation regarding multivariate coupling inducing indirect changes in low-variability regions (Lines 437-445) is insightful. For the benefit of the users, could the authors clarify if SDMBC v2 provides a "safety toggle" or threshold to limit corrections in regions where the observed variability is near zero, to avoid over-amplification of noise?
Minor Comments:
Line 68: "Fortran-based script, which consumed significant memory." Please clarify whether the memory issues were inherent to the Fortran language itself (which is typically highly efficient for numerical arrays) or due to the data-handling logic/structures in the legacy version. As written, the phrasing is somewhat ambiguous. I suggest changing this to "legacy implementation" or "inefficient data handling in the previous version" to avoid implying a limitation of the language itself.
Line 79: "core bias correction process remaining in Fortran." It is recommended that the authors specify the interface method used between Python and the Fortran cores (e.g., via f2py, ctypes, or subprocess calls). This technical detail is crucial for users configuring the software environment on High-Performance Computing (HPC) clusters.
Line 180: "eliminating the need for additional processing."Â This statement may be slightly too absolute, as users still need to manage configuration files (e.g., config.yaml). I suggest softening the phrasing to "simplifying the integration into existing RCM workflows" to more accurately reflect the software's advantage.
Line 197-198: Briefly justify the choice of conservative remapping for specific humidity vs. bilinear for other variables (e.g., to ensure moisture mass conservation).
Line 204-205: "1959–1989 (calibration) and 1990–2020 (validation)" The choice of 31 years is slightly unconventional compared to the standard 30-year WMO climate normal. While not a major issue, a brief mention of why 31 years were chosen (e.g., to include a specific leap year or alignment with ERA5 availability) would show attention to detail.
Line 312-313: "levels where specific humidity approaches zero... were excluded."Â This is a sound technical decision. However, the authors should specify the exact threshold used for "approaching zero" to allow for exact numerical replication by other researchers.
Citation: https://doi.org/10.5194/egusphere-2025-6411-RC2
Data sets
SDMBC v2 – Input and Output Datasets (Version 1.0) Youngil Kim and Jason Evans https://doi.org/10.5281/zenodo.17577882
Model code and software
young-ccrc/sdmbc_v2: SDMBC v2 – Version 1.0: Script Release for GMD Manuscript Submission Youngil Kim https://doi.org/10.5281/zenodo.17707370
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 215 | 69 | 23 | 307 | 47 | 13 | 16 |
- HTML: 215
- PDF: 69
- XML: 23
- Total: 307
- Supplement: 47
- BibTeX: 13
- EndNote: 16
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The authors introduce the Python-based software package called SDMBCv2, which is designed for bias-correcting global climate models (GCMs) prior to input into regional climate models (RCMs) for dynamical downscaling. Using the ERA5 reanalysis dataset as their "observation" dataset, they were able to show general improvements to MAE and K-S scores after bias correcting the ACCESS-ESM1.5 GCM. The SDMBCv2 software can thus be useful for researchers conducting dynamical downscaling. In general, the article was well written and presented, and I recommend it for publication after Minor Revisions, as well as addressing some questions about the overall utility/performance of the software.
Scientific Questions:
1.) How long does it take to run SDMBCv2? I do wonder about its performance as a Python package. Though I understand the accessibility of a Python package for a wider audience, most researchers working with GCMs/RCMs would already have familiarity with Fortran. From a scalability standpoint, especially across ensembles of GCMs and for different CMIP6 pathways, wouldn't a Fortran package be more useful? Though I understand that xESMF and CDO form the core of SDMBCv2, I would appreciate if the authors could comment on performance improvements (or penalties) vs. a pure Fortran approach (e.g. SDMBCv1).
2.) Have the authors used SDMBCv2 for other GCMs other than ACCESS? Especially in regards to the reduction in pass rate for q in hour 18 (Table 1); if this issue would persist with other GCMs. More generally, whether or not the SDMBCv2 could be generalized to successfully bias correct other GCMs.
3.) Have the authors tried running an RCM using the bias-corrected ACCESS? From a historical climate downscaling perspective, what advantage would this have over say downscaling from a reanalysis dataset directly?
4.) Related to 3.), one of the big assumptions with bias correcting a climate dataset is stationarity of bias into the future. Given that the improvement in metrics in 1990-2020 are fairly moderate after bias correction, what assumptions can be made regarding the performance of this approach for future years (either for the GCM directly or after downscaling with an RCM)? Corollary to this, have the authors tried different calibration periods, and/or testing periods, to confirm any temporal aspects to their bias correction approach?
5.) Also related to 3.): there was an emphasis on being able to represent extreme events in the paper. Has this been verified, i.e. that this bias-correction approach could improve the representation of 95+-percentile events, particularly after downscaling with an RCM?
Â
Minor Comments:
- Recommend that any reference to "Observations" or "observed dataset" should be switched to "Reanalysis"Â Â
- e.g. Line 140: "raw" ---> "GCM"
- Why is SST evaluated on a seasonal timescale (Figure 2) while the variables are evaluated on a daily timescale (Figure 3)?
- In Figure 5, the bias-corrected plots show sizeable biases. How is the computed MAE 0.0?
- Table 1: SDMBC ---> should be SDMBCv2
- Line 440: "It's possible" ---> "It is possible"
Â