the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Effectiveness of Multivariate Bias Correction in Hydrology and Agriculture: A Systematic Review
Abstract. Climate impact assessments in hydrology and agriculture often rely heavily on outputs from Global Climate Models (GCMs). However, a fundamental scale mismatch exists between the coarse resolution of GCMs and the fine-scale, multivariate data required by impact models. While Multivariate Bias Correction (MBC) methods have emerged as a solution to restore inter-variable dependencies (e.g., the correlation between precipitation and temperature), it remains unclear whether statistical improvements in climate data translate into more accurate impact projections. This study presents a systematic review of 39 peer-reviewed articles to evaluate the added value of MBC across hydrological and agricultural domains.
Our synthesis reveals a critical "validation gap" where superior statistical performance does not consistently yield improved impact simulations. We identify a divergence in added value dictated by the characteristic response time scales of the receiving systems. Agricultural models, which are often sensitive to immediate, daily compounded extremes (e.g., heat stress during low soil moisture), demonstrate a clear benefit from MBC. In contrast, general rainfall-runoff models often function as spatiotemporal integrators, acting as low-pass filters that dampen high-frequency incoherence; consequently, simpler univariate methods frequently perform equally well for bulk streamflow simulation. Furthermore, we highlight the risks of non-stationarity, where methods calibrated to historical correlations may fail under future climate regimes. We conclude that future method development must pivot from purely statistical refinement to more process-aware, regime-dependent frameworks. The ultimate goal is to produce methods capable of addressing non-stationarity and determining when – or if – multivariate correction adds value over simpler univariate approaches.
- Preprint
(1272 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
CC1: 'Comment on egusphere-2026-529', Nima Zafarmomen, 21 Mar 2026
-
AC1: 'Reply on CC1', Bhuwan Shah, 24 Mar 2026
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2026-529/egusphere-2026-529-AC1-supplement.pdf
-
AC1: 'Reply on CC1', Bhuwan Shah, 24 Mar 2026
-
RC1: 'Comment on egusphere-2026-529', Anonymous Referee #1, 28 May 2026
The review by Shah et al. tries to compile the current state of knowledge on the effectiveness of 'multivariate bias correction' methods. I was compelled by the topic but I think the manuscript has some significant weaknesses to be considered a 'systematic review'.
Please see my comments below, which the authors hopefully find useful to improve the manuscript.
General comments:
Title and throughout entire manuscript: I suggest using bias adjustment rather than bias correction, or at least defining the two terms explicitly. "Correction" may imply that model errors are removed, whereas generally selected statistical properties of climate-model output are adjusted and such methods cannot correct structural problems or process-level simplifications in the source data and models. I acknowledge that the literature is not coherent with this either, but from a study with this focus, I'd at least like to see a short discussion on the terms.The use of MBC as the main overarching definition for all methods analysed might not be ideally picked, as MBC and MBCn/p/r itself is an individual method mentioned in the study. Suggest to think about this.
I suggest to be a bit more nuanced in the conclusions and particularly the abstract given the points I raised during the review:
- not all univariate methods 'destroy the relationship between climate variables' entirely,
- I suggest to rather highlight the process complexity and interaction and whether the impact model uses multiple variables directly, instead of overstressing the difference between 'agricutlture vs hydrology'.I found severe inconsistencies in the representation of the numbers of studies and apparently how they were selected. See my comments below. I think for a systematic review study focussing that strong on the selection and information required from the studies, this is very unfortunate.
This also includes the lack of structured information from the studies - I think this could have been more detailed and standardized given the conclusions you draw from them. At the current stage I see this rather like a 'narrative review' rather than a 'systematic review' given the information your provide from the studies. It feels 'arbitrary' which studies you decided to highlight in your text and which ones weren't mentioned. If you want to keep this as a 'systematic review', I think these issues must be addressed. For instance, I think your claims should at least be accompanied by the information how many of the studies fall into the respective claim (see my comment below on the multi vs univariate comparison).
Some in my point of view very relevant literature was not acknowledged, such as
- https://www.sciencedirect.com/science/article/pii/S0022169425005517
- https://www.klimanavigator.eu/imperia/md/content/csc/klimanavigator/maraun16cccr_biascorrreview.pdf
- https://hess.copernicus.org/articles/29/4711/2025/While I agree that they do not speficially include 'hydrology *and* agriculture' from my point of view, these studies provide a very valuable option to compare your findings and important context for discussion.
Major comments:l.30, l.47: I wouldn't overstress that RCM output is not at all usable for impact models on scales relevant for hydrology and agriculture. The CMIP6 family of GCM and Cordex RCM models now has native GCM resolutions going down to <50km grid and ~12.5km (0.11°) respectively - these can be usable for regional-scale studies or where climate is homogeneous spatially.
l.92 I think 'flaw' is a too strong word here and I also don't think it applies to all 'simple' univariate adjustment methods (what about methods that don't change the distribution, e.g., the delta change method or linear scaling, for instance) - it is perhaps nowadays, from the most recent standpoint and given the improvement in knowledge and resources, but given the development and where bias adjustment originated in the past I would argue that it was a 'necessary but strong simplifiction'. Also, more generally, I don't see the need to outline the disadvantages of previously applied bias adjustment methods in this length and detail. Yes, it is important for context, but you are focussing your review on state-of the art methods and this is a very valid and reasonable justification already.
l.155 I feel section 2.1 and 2.2 should be merged. E.g. does the title "Selection Criteria" apply to the "Relevance Check" in Fig 2? Section 2.2 seems to be a direct part of the Figure while 2.3 seems to start with the "Selected for Review" Final Corpus, right? Also, I suggest to add some more details regarding accessibility and 'screening' and at what point of Fig 2 you exactly went from Abstract to full screening.
l.179 this bias can also be aggravated due to your focus on English language only
l.156, 176 and 179 the numbers don't agree with the figure - 60 vs 63, 39 vs 40 and 23 vs 21.
l.190 OoS - does it mean this study was out of scope? Also, the Hakala 2018 study uses QM only and is marked as univariate. Why were those included nevertheless?
l.205ff In addition to my comment at l.155, I find this chapter and its order confusing: You now mention 60-10-11 = 39 studies - but their in/exclusion have already been justified in Tables 1 and 2 and this type of screening is not appropriately included in Figure 2. Also, in chapter 2.4 you mention that 23 studies are excluded - not 21. But in Table 2 caption you again mention 21, but two of those do not seem appropriate (N/A and OoS)... And in figure 5, the sum is 22 studies. I find all these discrepancies rather sloppy given how much attention you claim to have paid on which study is included and which one not.
l.225 Is this classification also applicable to more parameters? And more generally, have you come across studies that did not follow the T,P,H parameters? What if a study had only T and P, or T+P+H+Wind, or T+P+Radiation, or all? Suggest to discuss the parameter consideration in Section 2 (if you have it ready and extracted from the studies you could add the parameters to Table 1).l.242 In my view it would be interesting to see a statistics of the full classification of the screened methods (l.191) into the three classes (i.e. simply a number in brackets accompanying Fig 4). It is a bit vague why you chose these here as the "most prominent" (was there a clear cutoff)?
l.246 is this across the literature you identified or in general? If it is in general, I find it confusing to from now on mix review results from studies outside your screening.
l.248 there is a third option that is most appropriate but very rarely applied: out-of sample testing - and if this method was applied in any of your studies, I would suggest to highlight it: https://www.nature.com/articles/s41558-018-0355-y. If none of your studies apply it, I would still suggest to highlight this method in the introduction section.
l.300-310 in your original "selection criteria" list, you don't mention that the studies must include a comparative analysis between multivariate methods vs univariate methods. I'd assume this is an additional strong filter. Yet, this chapter here implies exactly that when you mention "superior" and when showing Figure 5. Are you sure that all studies from Table 1 really include such a comparison - if not, this requires subsetting the studies to be appropriate for this type of analysis. I would even argue that it depends on the univariate method the multi-variate method is compared to and that you specifically show in Table 1 which univariate method it is compared to: If you have a univariate method that does not alter the P-T distribution, but 'only' scales the signal, I would assume the difference you see between multi- and univariate methods is not that compelling. Also note that the sum in Fig.5 adds up to 22 - not 21.
l.337ff I'd like to add another thought here: Streamflow simulations of hydrolgoical models are primarily driven by precipitation. Univariate bias adjustment methods are likely able to modify the climate change model outputs more significantly and closer to the observations than a multi-variate method, which needs to balance multiple 'targets'. This 'tradeoff' can lead to a superior performance for univariate methods. (ok, I see you have mentioned this at l.467-469)
l.362-363 "—such as psychrometric properties governing moist air thermodynamics (e.g., the
Clausius-Clapeyron relation between temperature and saturation vapor pressure)—remain invariant." Why is this relevant (and so important to be the concluding remark for this chapter) for streamflow characteristics?
l.488ff I think additional limitations are the information available in your data source or the amount of information you distilled from the studies - the suitability/importance of uni-or multivariate adjustments depend on a variety of factors (the type and detail of model, the underlying data it can ingest, the number and complexity of processes compared to observations, the scale...). In essence, I think you should add the limitation that the selection and suitability of the method is highly context-dependent and it also comes down to the individual study and its aim and that not all studies involved might not have gone down to that level of detail.
l.385-386 I think this statement is too generalized - based on what you wrote earlier, I think it rather should be framed regarding "process interdependence and complexity" rather than a clear distinction between agricultural or hydrological models. I see you cover most of my critical points in the following chapter, nevertheless, I think it makes sense to reduce this strong comment a bit and highlight the process impact also in this introductory sentence.l.433 how where these 'highly erroneous results' evaluated for the future? Again, I'd like to come back to my earlier comment and you could reference studies here that conducted out-of sample validations and their impact on non-stationarity.
l.520 I assume you must have more detailed and structured tables that distilled data from the individual studies than what is presented in Table 1? I would have loved to see a structured table with much more details from the studies (I have mentioned a few points throughout the manuscript). Such standardized and detailed tables are typically available from other comprehensive review studies as supplementary material.
Minor comments:
l.7 and l.26: Two different definitions for GCM. While both are valid according to the IPCC, I suggest to stick to one and use the more appropriate one for you bias adjustment focus. Or discuss the differences distinguish between both if you needl.36 RCM already introduced
l.62 initial state and natural variability are also considered with additional simulations by perturbed-initial-condition ensembles
l.154 what does TS stand for in the figure?
l.185 studies included *in* the final
l.369 suggest to cite from your selected studies
Citation: https://doi.org/10.5194/egusphere-2026-529-RC1 -
RC2: 'Comment on egusphere-2026-529', Anonymous Referee #2, 30 May 2026
This paper provides an overview of the commonly used multivariate bias correction methods for hydrology and crop production published in the literature and highlights their advantages and, at times, shortcomings compared to univariate bias correction methods.
I think the paper is interesting. It can help as a steppingstone to familiarize impact modelers with the biascorrection methods at hand.
Major Comments:
- Some of the arguments are not always scientifically correct, or at times are difficult to understand. I also miss the connection to the main aim of the paper in several places when seeing those arguments.
- The whole Section 3.3 is somewhat chaotic and needs revision. I explain this further below.
- I find that the text sometimes deviates from scientific writing and adopts a more popular science style. The choice of wording is occasionally exaggerated. For example, there are numerous uses of phrases such as "compelling study," etc.
- The recommendations are somewhat synthetic and not very novel. For example, when it comes to bias non-stationarity, it is unclear how one can develop such methods. What are the clues or directions for doing so? In addition, there is now a new family of climate simulations called Single Model Initial-condition Large Ensembles (SMILEs), which allow for the assessment of time of emergence while accounting for internal variability. It would be good if the authors could cover how bias correction is affecting those and how SMILEs should be evaluated in this context. For a reference, please check https://hess.copernicus.org/articles/29/5695/2025/.
Specific Comments:
L27: Please use a more fundamental reference.
L32: 25–1 km does not seem right.
L33: Do you mean 250?
L35: The sentence after the hyphen is unclear.
L38: I am not sure I understand what you mean by resource allocation. The purpose of going from a GCM to an RCM is to include more processes.
L39–41: This whole sentence could have a better flow.
L50: Please avoid using assertive phrases such as "well known." Some people may not be familiar with the method.
L55: Unclear sentence.
L62–72: This whole section reads chaotic. Is bias correction really supposed to correct drift? I do not think the aim was ever to correct errors caused by internal variability. Rather, the goal of bias correction is only to correct simplification errors. The last sentence is also unclear.
L75: Panacea?
L77: A citation alone does not validate a statement. Please use different phrasing.
L79: The statement that bias correction methods assume stationarity is not correct. In fact, I would argue that most modern bias correction methods account for the non-stationarity of the climate signal. Please check the bias correction papers by Mathieu Vrac, Tootoonchi, and Teutschbein.
L83–85: This is a separate topic in climate modeling that does not connect well with the objective of the paragraph.
L89: "All others" is unclear.
L92: Please tone down words such as "profound" and "critical."
L93: No, univariate bias correction cannot destroy the dependence structure. It simply reproduces what the climate model produces. Please check François et al. (2020) or Tootoonchi et al. (2022).
L95: "Without regard to the others" is unclear.
L97–100: This sentence is unclear.
L116: Why is it specifically needed in those locations?
L121–123: Examples?
Fig. 4: What are P, T, and H?
Table 3: The MRec method was not developed by François et al. (2020). Please find and cite the primary reference.
Table 3: What about examples of Successive Conditional methods?
L261: Examples of papers doing so?
The whole Section 3.3 is chaotic. This preface should be shortened and made more concise. In many ways, it repeats what is discussed in the following pages. What does "arbiter" mean in this context? L293–299: What is this whole section based on? We have not yet read the synthesis report, so this information is hidden from the reader.
L305: What is meant by "stark contrast"?
L311: I think the hydrology part should come before the agricultural one. This has been the order throughout the paper.
L316: What is meant by an "unambiguous conclusion"?
L342: Please tone down "compelling."
L370: What is meant by a "landmark study"?
L414: What is a "canonical example"?
L415–418: The whole paragraph is unclear, and there are no references.
Citation: https://doi.org/10.5194/egusphere-2026-529-RC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 1,218 | 490 | 74 | 1,782 | 64 | 65 |
- HTML: 1,218
- PDF: 490
- XML: 74
- Total: 1,782
- BibTeX: 64
- EndNote: 65
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This manuscript presents a systematic review of multivariate bias correction (MBC) methods for climate-model outputs used in hydrology and agriculture. A main contribution is the identification of a “validation gap”: methods that improve inter-variable dependence in climate data do not always improve downstream impact simulations. The review highlights bias non-stationarity, degradation of temporal autocorrelation, challenges in high-dimensional settings, and the possibility that impact-model results can be influenced by the structure of the impact model itself rather than by the climate correction method alone. The paper highly worth publishing in HESS. I just put some minor revision:
1) Please resolve all numerical inconsistencies in the review flow. The manuscript alternates between 60 vs 63 unique studies and 39 vs 40 included studies. These numbers must match across the text, Figure 2, Table 1, and Section 2.
2) In several places, claims are written too categorically. Phrases such as “clear superiority,” “overwhelmingly positive,” or “100% of relevant studies” should be softened or accompanied by exact denominators and caveats about sample size and study heterogeneity.
3) The manuscript would benefit from one compact table that separates studies by domain, validation type, number of variables corrected, and whether the comparison included a univariate benchmark. That would make the synthesis more transparent.
4) On the hydrology side, it would be helpful to distinguish more explicitly between streamflow simulation, snow/hydroclimate process simulation, and flood/drought hazard applications, since the value of MBC appears to differ across these subdomains.
5) Please check wording around method classes. Some readers may find the distinction between “marginal/dependence,” “all-in-one,” and “successive conditional” too brief; one more sentence on advantages and drawbacks of each class would improve accessibility.
6) There are a few places where the paper appears to conflate review evidence with author interpretation. For example, the “low-pass filter” explanation is good, but should be introduced as a synthesis or conceptual interpretation rather than as a directly demonstrated universal result.
7) Because this is aimed at an interdisciplinary audience, a short glossary of major acronyms such as MBCn, dOTC, MRNBC, MRQNBC, and R2D2 would improve readability, despite Table 3 already helping with this.
8) I do strongly recommend that the authors cite the recent study, “Analysis of historical global warming impacts on climatological trends for the partially gauged Hirmand River Basin based on multiple data products and bias correction methods,” as it is relevant to the manuscript’s discussion of bias-correction performance, trend preservation, and hydroclimatic applications in a data-scarce basin.