the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Bias Correcting Regional Scale Earth Systems Model Projections: Novel Approach using Empirical Mode Decomposition
Abstract. Bias correction is a crucial step in using Earth systems model outputs for assessments, as it adjusts systematic errors by comparing the model to observations. However, standard methods – ranging from mean-based linear scaling to distribution-based quantile mapping typically treat bias correction as a single-scale process, overlooking the fact that biases can manifest differently across daily, seasonal, and annual timescales. In this study, we propose a novel, timescale-aware bias-correction approach built on Empirical Mode Decomposition (EMD). By decomposing the meteorological signal into multiple oscillatory components and aggregating them to represent distinct timescales, we apply targeted corrections to each component, thereby preserving both short- and long-term structure in the data. Experimental validations demonstrate that this finer-grained method substantially improves upon existing bias-correction techniques such as quantile mapping. As a result, the proposed approach offers a more robust path to accurate and reliable Earth systems projections, strengthening their utility for resilience and adaptation planning.
- Preprint
(20368 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
CEC1: 'Comment on egusphere-2025-1112 - No compliance with the policy of the journal', Juan Antonio Añel, 08 Apr 2025
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.htmlYou have archived your code on a Git site in a server hosted by the Argonne National Laboratory. However, it is not a suitable repository for scientific publication and as a result your manuscript does not comply with the policy of our journal. Therefore, the current situation with your manuscript is irregular, as we can not accept manuscripts in Discussions that do not comply with our policy.
We ask you to publish your code in one of the appropriate repositories listed in our policy and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible.
Something similar applies to the data. You have not published the necessary specific data to replicate the work presented in your manuscript, but simply point out to a data portal hosted by the ANL. We can not accept it, first, because again it is not a suitable repository for scientific publication; Secondly, because it does not identify clearly the exact data that you have used, making hard to get access to it for readers. Therefore, you must deposit the necessary input and output data from your work in one of the suitable repositories, and reply to this comment with the relevant information (again, link and permanent identifier).
Also, you must include a modified 'Code and Data Availability' section in a potentially reviewed manuscript, containing the information for the new repositories. Also, I have to note that the Git site for the Python scripts does not have a license listed. If you do not include a license the code remains your property, and nobody can use it. Therefore, when uploading your code to the repository, please, choose a free software/open-source (FLOSS) license. For example, if you use the GPLv3 you simply need to include the file 'https://www.gnu.org/licenses/gpl-3.0.txt' as LICENSE.txt with your code. Also, you can choose other options such as: GPLv2, Apache License, MIT License, etc.
Finally, I have to note that if you do not fix this problem, we will have to reject your manuscript for publication in our journal.
Juan A. Añel
Geosci. Model Dev. Executive EditorCitation: https://doi.org/10.5194/egusphere-2025-1112-CEC1 -
AC1: 'Reply on CEC1', Arkaprabha Ganguli, 19 Apr 2025
Thank you for highlighting the gaps in our code and data availability section. We recognize the importance of full compliance with GMD’s Code and Data Policy. We are currently preparing a public GitHub repository (under an open‑source license) for EMDBC, as well as a Zenodo archive for all input and output data supporting our manuscript. We expect both to be finalized by early next week and will post the DOI links here as soon as they are available.
In the meantime, could you please advise us on the procedure for updating the existing preprint on the GMD server with the revised Code and Data Availability section once our repositories are live? Should we send the updated manuscript directly to the GMD editorial office via email, or is there a different mechanism to replace the preprint?
Thank you for your assistance. We look forward to bringing our submission into full compliance.Citation: https://doi.org/10.5194/egusphere-2025-1112-AC1 -
CEC2: 'Reply on AC1', Juan Antonio Añel, 19 Apr 2025
Dear authors,
Given your reply, I have to insist that Git repositories are not suitable for scientific publication. In your reply you say that you are preparing a new GitHub repository, and this is not going to solve the pending issues with your manuscript. It is necessary that all the code is hosted in a long term repository suitable for scientific publication, such as the Zenodo one that you mention you will use for the data.
At this stage you do not need to update your manuscript, but reply to this comment with the information for the repositories (link and permanent identifiers (e.g.DOI)) where you have deposited both code and data, and a tentative new text for the "Code and Data Availability" section. In this way the information will be public and available to anyone. If your manuscript undergoes additional peer-review, and the Topical Editor ask you for a reviewed version of it, or accepts it for publication, you will have the opportunity to modify the manuscript in such stage.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-1112-CEC2 -
AC2: 'Reply on CEC2', Arkaprabha Ganguli, 24 Apr 2025
Dear Mr. Añel,
Thank you for your clarification. We agree that the GitHub repository alone does not meet GMD’s Code and Data Policy requirements. In response to your comment, we have created a Zenodo repository that includes a frozen version of our codebase and the full dataset used in the manuscript. Accordingly, we have updated the “Code and Data Availability” section, which now reads:
'''
All Python scripts for the Empirical Mode Decomposition-based Bias Correction, the full-domain WRF-CCSM dataset used in this manuscript, and the validation areas mapping WRF-CCSM indices to 25×25 case study regions are available in a Zenodo repository at https://doi.org/10.5281/zenodo.15244202 (Ganguli et al., 2025). Livneh daily CONUS observational data (Livneh et al., 2013), provided by NOAA Physical Sciences Laboratory (NOAA-PSL) in Boulder, Colorado, USA, are available at https://psl.noaa.gov/data/gridded/data.livneh.html (NOAA-PSL, 2013). For Livneh, daily mean temperatures are computed as the average of the daily minimum and maximum values. Finally, the Empirical Mode Decomposition-based Bias Correction code is also available in the EMDBC GitHub repository at https://github.com/jeremyfifty9/emdbc (Ganguli and Feinstein, 2025).
'''As Zenodo requires a formal publication step to finalize the repository, we have generated a shareable draft link for editors, reviewers, and the public to preview. Should the manuscript be accepted, we will formally publish the Zenodo archive to finalize the citation. In the meantime, the draft version can be accessed at the following link:https://zenodo.org/records/15244202?preview=1&token=eyJhbGciOiJIUzUxMiJ9.eyJpZCI6IjA2NDAxYmEyLTIzYzYtNGVjYy04YmU2LWQ4YmVmMzM0OGUzNCIsImRhdGEiOnt9LCJyYW5kb20iOiIwMTJlYTcwY2Q1NDU5ZDhkM2Y5YjU0MjBlY2RmMDNmNSJ9.-J8j9pZ1K4Zm7Y1KoUBVMFmp6QsbE6k9s0Gffv_eSmLwrM2MvuPW9xbJL_d9mSd3zYM6ni13wAAPMQ3VXkCNFQ.
Thank you for your assistance. We believe this revision strengthens the manuscript by ensuring long-term accessibility of our materials.
Citation: https://doi.org/10.5194/egusphere-2025-1112-AC2 -
CEC3: 'Reply on AC2', Juan Antonio Añel, 24 Apr 2025
Dear authors,
Unfortunately, after checking the links and DOIs that you provide for your repository, I have seen that the intended repository is not formally published. The full link that you provide (the three lines one) works, but points to a Zenodo site which does not even contain a version, as it is listed as unpublished. Therefore, you must solve this issue, and make the repository public before we can consider the issues I pointed out before solved.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-1112-CEC3 -
AC3: 'Reply on CEC3', Arkaprabha Ganguli, 25 Apr 2025
Thank you for your patience and for checking our repository links. We have now formally published the Zenodo archive and confirm that it is live with a proper version and DOI. The updated repository details are:
- Zenodo DOI: 10.5281/zenodo.15244201
- Link to the Zenodo page: https://doi.org/10.5281/zenodo.15244202
The updated “Code and Data Availability” section in my previous comment reflects this DOI. Please let us know if there is anything further needed from us.
Thank you again for your guidance.Citation: https://doi.org/10.5194/egusphere-2025-1112-AC3 -
CEC4: 'Reply on AC3 - No compliance with the policy of the journal', Juan Antonio Añel, 09 May 2025
Dear authors,
Thanks for addressing the issues regarding the code and data policy. I have checked your repository, and we can consider now the current version of your manuscript in compliance with the Code and Data Policy of the journal.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-1112-CEC4
-
AC3: 'Reply on CEC3', Arkaprabha Ganguli, 25 Apr 2025
-
CEC3: 'Reply on AC2', Juan Antonio Añel, 24 Apr 2025
-
AC2: 'Reply on CEC2', Arkaprabha Ganguli, 24 Apr 2025
-
CEC2: 'Reply on AC1', Juan Antonio Añel, 19 Apr 2025
-
AC1: 'Reply on CEC1', Arkaprabha Ganguli, 19 Apr 2025
-
RC1: 'Comment on egusphere-2025-1112', Anonymous Referee #1, 12 Jun 2025
This paper proposes a complex new way to do bias correction as a function of time scale. They claim that this technique is useful for impact analysis and improves on older simpler methods. But the paper does not provide any examples of this. It would greatly benefit from a couple examples where the new technique provides significantly improved understanding of the effects of climate change on a particular impact.
Why does the paper use such old simulations to demonstrate the technique?
The paper seems to ignore the diurnal cycle. If you are going to bias correct based on time scales, this needs to be included. The daily maximum and minimum temperatures are very important for many impacts. Just using daily means is not sufficient for temperature. And why is bi-weekly the shortest timescale? Many impacts depend on daily or subdaily timescales. Some extremes, which are very important, occur on these short time scales, including heatwaves and floods.
Without addressing the above items, it is difficult to determine whether this paper is useful or not.
Figure 1 has several problems. It is colored, but there is no information about what the colors mean. The caption that a colorbar is not needed is absolutely wrong. What is the variable and how was it calculated? Also, there are no x- and y-axes. And the edges of the boxes are not discernible. There needs to be a table with each region and its location with the lat and lon of each edge.
The paper has very many acronyms, and learning all of them makes the paper hard to read. And some of them are not defined at all. If the terms are not used multiple times, don’t use acronyms and just write them out. And it would help to include an appendix with a list of the acronyms and their definitions.
What about spatial scale? Usually bias correction is paired with downscaling. Are they independent? Does it matter in what order they are done?
Figure 2 has too many panels with tiny font that is illegible. I can’t read the subpanels in the first row. The text in the last row is too tiny. And what is this time series? Please explain where the data for the observations and model came from.
Line 203 says, “At longer timescales (seasonal or annual), biases often manifest in more systematic patterns that persist across multiple years.” but the authors provide no evidence for this. Why do you think the climate system would behave this way? Please provide references.
The paper uses “validation” in many places where it should be “evaluation.” Please correct these. Validation means that you already know the results are valid.
Figures 3 and 4 are missing units. And what is Wasserstein?
The first three panels in Fig. 5 just show lots of colored lines, and it is impossible to compare them. The last row only has 4 lines, but the colors are similar and again it is hard to figure out which line is which. And what is MSE and what are its units? Why are the last three panels centered on 0? Are these anomalies? With respect to what?
For the example in Fig. 5, the largest differences in the annual temperature are just for three years. Why, and how does this skew the average scores?
Why are some plots semi-annual and others annual?
In Fig. 6 in the last row, I can’t tell any differences in the distributions. Are the differences really significant?
Figs. 7-9: There is not enough information to understand what is plotted? What are the exact time periods? What are the sources of the data?
Also, any response has to address the 30 comments in the attached annotated manuscript.
-
AC4: 'Reply on RC1', Arkaprabha Ganguli, 28 Jun 2025
Thank you for your careful and constructive review. We appreciate the time and effort you devoted to evaluating our work. We attach here our point-by-point response letter. We hope our responses address your concerns, and we would be happy to provide any additional clarification if needed.
-
AC4: 'Reply on RC1', Arkaprabha Ganguli, 28 Jun 2025
-
RC2: 'Comment on egusphere-2025-1112', Anonymous Referee #2, 23 Jun 2025
The authors have combined various existing bias correcting methods of the literature to account for timescale-aware bias corrections. They compared a model to a set of observations on a historical period and propagated the bias correction to mid- and long-term predictions. For some timescales, this bias correction method improves upon existing methods, on one specific region studied here and for an atmospheric model. The paper reads well and has many figures illustrating the results.
However, the text and the figures should be improved to clarify the method, the results and their performance. Thank you.
Here are some general comments:
- Please clarify the novelty of the method: combining several methods (name them) into one framework?
- To improve the performance study it would be helpful to broaden the evaluation: why is a regional dataset enough (USA)? what about using a larger spatial extent (global instead of USA)? Can the method be compared to other methods (mentioned in the Introduction) to illustrate its performance? If only one model is used in the end, add name of the model in the title.
- When explaining the methods, one or several schematics would help -> with method is used for which time scale, how is the data used for the evaluation, etc…
- The figures should be more understandable, some graphs are not readable, all plots should be commented. Please add a letter to each sub-figure and refer to the letter in the text.
Here are some specific comments:
L5: “Meteorological signal”-> “Atmospheric variables” ? Which variables / components of the earth system model? which time scales? Which area (USA)?
L7: “Experimental validations demonstrate that this finer-grained method substantially improves upon existing bias-correction techniques such as quantile mapping“: it was not demonstrated. “illustrates”? And add on which timescale it improves.
L8-10: add the limitations of the method (see conclusion).
Introduction
L 21. “Unlike statistical downscaling”: first define statistical downscaling.
L 25. “Region-level modeling”: North America Mearns et al. 2012, North American component of the Coordinated regional downscaling experiment NA-CORDEY, Mearns et al, 2017 -> add other examples on other regions and other authors?
L 29-31: “Despite these improvements… from forcing data and inherent systematic errors such as those ….”-> add references for the source of the RCMs biases
L47. QM method: transfer function based on the quantile distribution, daily values -> not necessarily daily?
L 51. QM improve model accuracy for both mean and extreme events “(Wood, 2002; Wood et al., 2004; Boé et al., 2007; Piani et al., 2009, 2010; Ashfaq et al., 2010; Teutschbein and Seibert, 2012; Gudmundsson, 2012)” : split the references between the type of application
L61: “generally parallels that of quantile-based techniques and does not address the core challenge of biases that occur across multiple distinct timescales” -> “generally” suggests it is well known in the literature -> add other references. What does Dhawan et al 2024 show?
L62. Biases different at daily monthly seasonal annual scales -> different in what sense? not multi annual? Or decadal?
L67 “Empirical mode decomposition-based bias correction (EMDBC)”: leveraging the adaptive nature of Empirical Mode Decomposition EMD and its ensemble variant EEMD: explain on which timescale it is used and on which timescale QDM is used
L 73: section 3: demonstrates EMDBC effectiveness-> “demonstrates” is too strong here
Methods:
L 80: why is this dataset used?
L 78. observed and modeled temperature -> detail what is used for what?
L 80: WRF-CCSM: explain acronym. Only atmosphere or coupled ?
L 81 “moodeled“
L 85. “RCP 8.5 scenario is used“: explain acronym + add reference
L 90: “3x10s^1” -> why s^1?
L 93: why is this dataset used?
L 97 “leverage observation data“-> in which way?
L 98: typo: “is used to THE learn“
L99: daily mean temp data calculated from the 3h outputs of WRF CCSM to match the temporal resolution of the observed Livneh data -> upscaling
L 102: Statistical framework is then applied to identify and learn the systematic biases in the simulation data. -> what statistical framework? which simulation data, WRF-CCSM or Livneh?
L 104 “the model generates bias-corrected future predictions that scale more closely with observational data“-> which observations are available for future predictions ? or maybe the authors refer to the next paragraph ? this paragraph should probably be merged with the next paragraph for clarity. Not clear how Livneh data is used: the data is spilt into 2 parts?
L 116: QM and QDM: add a Graphic illustrating the concepts of BQM and QDM (many examples in the literature, e.g. HESS - Peer review - Precipitation ensembles conforming to natural variations derived from a regional climate model using a new bias correction scheme + https://doi.org/10.1007/s40641-016-0050-x -> maybe also add these two references)
L 120: F(T) and CDF: give the formula
L 122: which model outputs?
L123 and 131: Eq 1 and Eq2: “p” refers to what?
L 135: “Nonparametric empirical CDFs are commonly used for flexibility, although parametric and semiparametric distributions can also be employed” -> which formula for parameter (Semi-) distributions? which adjustable parameters?
L 143: “meteorological time series“ is it meteorological modelling or climate modelling ? is it “meteorological variables” in a “climate model”
L 146: “future projections“ -> is it fitted only to climate projections or also to short term weather forecasts?
L 153: “Can suffer mode mixing” -> explain more how this occurs
L 155. Add other references using EEMD in Climate sciences: e.g.
Investigating monthly precipitation variability using a multiscale approach based on ensemble empirical mode decomposition | Paddy and Water Environment
Identification of relationships between climate indices and long-term precipitation in South Korea using ensemble empirical mode decomposition - ScienceDirect,
The multi-timescale temporal patterns and dynamics of land surface temperature using Ensemble Empirical Mode Decomposition - ScienceDirect,
A time series processing tool to extract climate-driven interannual vegetation dynamics using Ensemble Empirical Mode Decomposition (EEMD) - ScienceDirect
L160 Explain why the noise helps
L161. All the operation can be done with the Python package PyEMD?
L 171. “Total number of extracted IMFs m^s” -> why “^s”?
L 174. “To address this, we implement an additional hyperparameter-tuning step that reinforces distinct frequency separation and minimizes overlap among IMFs“-> give type of procedure used in a few words ?
Eq 6: write the formula showing how it aggregates: To=…
Eq 6: what is “tau 1 m T0“? “tau 2 m T0“?
L 177 and L 181: give the values of the thresholds tau_1 and tau_2. How are they estimated?
L 181: “selecting τ1 and τ2 such that the IMFs most closely matching each frequency range are grouped together.“ -> give the formula to select the values
L 179: maybe change the order in the sentence: “In this study, we use the butter function available in scipy (Virtanen et al., 2020) to perform bandpass filtering of the original signal, isolating the frequencies associated with each timescale” -> “In this study, we perform bandpass filtering of the original signal, isolating the frequencies associated with each timescale using the butter function available in scipy (Virtanen et al., 2020).”
[ Appendix A
L 345: What is the relative change in frequency between consecutive IMFs?: there is a gab or no gap between IMFs?
L 355: Show the case of 2 IMFs: values of f0, f1, f2
L 355: Tuned through cross-validation? with which data?
L 356: yielded satisfactory results: which criteria is used?
In Algo 1: Recompute f(j)max -> how are the frequencies updated?
A schematic of the algorithm would help.]
Fig 1: how are regions selected? Add color bar or remove colors.
Fig 2: Difficult to understand which are the important information on the plots
Fig. 2: Why not compared directly the observation VS the corrected on the same plots? and the input VS corrected output on the same plots? and the Timescales VS Corrected Timescales on the same plot.
Fig 2: A schematic of the method would help understand the successive plots
L 186 “the nature of the biases can vary greatly depending on whether we are dealing with short-term fluctuations (e.g., biweekly scales) or longer-term patterns (e.g., seasonal or annual). “ -> indicate how it changes
- 191” At the biweekly scale, signals often exhibit substantial variability and frequent extremes, yet show little in the way of stable temporal patterns that persist across years“ : reference ?
L 192- 193“Because a more complex regression approach is unlikely to provide significant benefits at this resolution, we use the QDM to correct these components“: reference?
L 202: Were other methods tested to confirm the hypothesis that QDM is the best here?
L 203 “At longer timescales (seasonal or annual), biases often manifest in more systematic patterns that persist across multiple years.” -> reference?
L 237: Here a schematic showing the successive steps would be very helpful
L 237: Was the QDM tested also on these timescale to confirm the hypothesis that the used method is better than the QDM on these timescales?
L 237: When is EEMD applied?
2.5 Visualization:
L 239: Give the numbers of the figures here -> is a sub section (2.5) needed just for 1 sentence?
Results
- Fig3: in the caption: explain what values are represented in the box VS lines VS dotes
- L 242: “we apply a spatial smoothing procedure to the bias corrected daily temperature fields” can you justify why?
- L 253: “Section 2.1.Figure 3“ need a space before “Figure”
- L 257: give the definition (and/or a reference) for the WD
- L.258: Fig 3 top: comment on why larger bias for some regions (S, N, MidWest) and larger uncertainties for SW, NW?
- Fig 3 bottom: why similar WD for all region although biases are bigger for some regions (top)? I would expect a larger WD for the regions with higher biases?
- Fig 4 and L273: with Northern and Midwest regions have larger biases for the GDM corrected datasets than for the not corrected datasets? -> should not be used then here?
- Figure 5: add letters for sub plots (also in other figures):
- Figure 5: difficult to see something. In particular in the top 2 plots: we cannot see the curves. “While QDM achieves performance comparable to EMDBC at the daily (training) scale,” -> how is this observed?
- Figure 5: in the legend add the “Reference datasets” to the “Livneh” legend. Change the Livneh curve to a dotted line.
- Figure 5: the biweekly plot is not commented in the caption
- Figure 6: explain what is plotted in the box plots (box, line, dotes) and in the violin plots (tail cutted? what is the black box and the white line?)
- Figure 6: not color-blind compatible?
- L. 275 “These results demonstrate that EMDBC successfully preserves bias-corrected signals over a broad range of temporal frequencies” -> but it is a specific dataset, is it representative of other regions and other periods?
- 3.2 “Over full domain” which domain here? temporal spatial? which domain was used previously?
- Figure 7: what should “Mid-century” violin plots be compared too? it is misleading to have them on the same plot at the historical data -> it should not be compared to it? Maybe bring the violins that should be compared closer to one another. E.g. 3 block: 1 historical bloc (Livneh + historical models) -> then a gap -> 1 block with 3 mid-century violins-> gap -> 1 bog with 3 Late-century violins
- L 285 “In each sub-region, the top panel compares the absolute temperature bias between the model projected and the observed series before and after correction with EMDBC and QDM, whereas the right panel shows the distribution of the average temperature” -> “the top panel”… “whereas the right panel” ? is it here “whereas the bottom panel”?
- Fig 8: CONUS is not defined?
- L 295: Fig 9 and Fig8 -> Fig 8 and Fig 9
- Fig 8 and Fig9. give letters to the subplots
- L 295: please comment each of the subplots or remove if not used in the text.
- Fig 8 and 9: why northern America has large biases (left column)?
Conclusion:
L303-304: add the spatial extent of the output on which it is applied, and the resolution.
L304: add “temporal” to “Temporal downscaled”
L305: add the years to the periods: “historical 19..-19.., mid-century 20..-20.. and late-century 20..-20..”
L312: “meteorological variables“ -> „atmospheric“ ?
Code and reproducibility: The zenodo repository contains only 1 script for 1 figure that is not one of the figures of the paper. It would be nice to have the scripts to compute plots of the paper. Could the repository be added to a public github page ? The link given is not public: https://git.cels.anl.gov/jfeinstein/emdbc-paper (Feinstein, 2025). Thank you.
Citation: https://doi.org/10.5194/egusphere-2025-1112-RC2 -
AC5: 'Reply on RC2', Arkaprabha Ganguli, 09 Jul 2025
Thank you for your careful and constructive review. We appreciate the time and effort you devoted to evaluating our work. We attach here our point-by-point response letter. We hope our responses address your concerns, and we would be happy to provide any additional clarification if needed.
-
AC6: 'Comment on egusphere-2025-1112', Arkaprabha Ganguli, 11 Jul 2025
We thank the reviewers once again for their constructive feedback. We have provided detailed answers to every comment, added the requested plots, and believe this discussion has significantly improved the manuscript. With our responses now complete, we are finalising the discussion.
Citation: https://doi.org/10.5194/egusphere-2025-1112-AC6
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
678 | 166 | 33 | 877 | 13 | 43 |
- HTML: 678
- PDF: 166
- XML: 33
- Total: 877
- BibTeX: 13
- EndNote: 43
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1