the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Exploring the Potential of History Matching for Land Surface Model Calibration
Abstract. With the growing complexity of land surface models used to represent the terrestrial part of wider Earth system models, the need for sophisticated and robust parameter optimisation techniques is paramount. Quantifying parameter uncertainty is essential for both model development and more accurate projections. In this study, we assess the power of history matching by comparing results to variational data assimilation, commonly used in land surface models for parameter estimation. Although both approaches have different setups and goals, we can extract posterior parameter distributions from both methods and test the model-data fit of ensembles sampled from these distributions. Using a twin experiment, we test whether we can recover known parameter values. Through variational data assimilation, we closely match the observations. However, the known parameter values are not always contained in the posterior parameter distribution, highlighting the equifinality of the parameter space. In contrast, while more conservative, history matching still gives a reasonably good fit and provides more information about the model structure by allowing for non-Gaussian parameter distributions. Furthermore, the true parameters are contained in the posterior distributions. We then consider history matching's ability to ingest different metrics targeting different physical parts of the model, helping to reduce parameter space further and improve model-data fit. We find the best results when history matching is used with multiple metrics; not only is the model-data fit improved, but we also gain a deeper understanding of the model and how the different parameters constrain different parts of the seasonal cycle. We conclude by discussing the potential of history matching in future studies.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(10545 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(10545 KB) - Metadata XML
- BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-2996', Toni Viskari, 15 Mar 2024
This is a review for the manuscript “Exploring the Potential of History Matching for Land Surface Model Calibration” submitted to Geoscientific Model Development by Raoult et al. In the work included in the manuscript, the authors examine the benefits of using History Matching (HM) for Land Surface Model (LSM) calibration with the ORCHIDEE LSM model by conducting a twin experiment reflecting a site often used for ORCHIDEE calibration. In addition they compare the HM results with how VarDA calibration performance.
From the HM implementation/experiment part, I thought the manuscript was well done and successful. It is a basic what is being done here, but the paper is open about that and establishes itself more as a foundation work that will be expanded upon in the future. There are some parts where some clarification is needed, but overall I was quite satisfied with the work done and how it is presented here.
Which, however, brings us to VarDA comparison as I did not really comprehend the purpose of it. As the manuscript even itself admits in the discussion section, how those two calibration methods function are so fundamentally different that the results really can’t be compared effectively. VarDA is ultimately an optimization method, especially when applied in the manner here, and even the uncertainty approximation used in the work is kind of a fix as the approach itself does not actually produce uncertainties. That has been one of the great challenges in using 4D-Var in forecast service and has produced several different workarounds.
So to use VarDA, which essentially was 4D-var here, to estimate parameter values and then compare those with a method that only estimates uncertainties was odd, to be honest. Furthermore, the more commonly used MCMC methods actually produce uncertainty estimates that would be a more apt comparison for the HM results. While the discussion does touch on why MCMC wasn’t used here, it should have been done more explicitly earlier and even then it remains questionable. In truth, it almost felt like the author was worried that the paper itself would have been too short or limited without the comparison, but it doesn’t really add that much.
Apologies on the brusqueness of the feedback regarding the comparison, but it is driven by how much I did appreciate the HM part of the work. If this paper was still in the preparation phase, I would strongly argue removing the comparison, but at this stage that is too drastic an action to take. And while I continue to have my issues with the VarDA part, I do think the rest of the paper is strong enough to be considered for publication.
Thus my recommendation is to return for major revision with the focus, in addition to the detailed comments later on, being on strengthening the reasoning and expectations for the VarDA comparison both in the introduction and in the methods section.
Below are my line-by-line comments for the manuscript. As a general comment, though, there are several points where the manuscript refers to something as usually or commonly done which I feel should almost all be removed. Just state what is done and why instead of using generic referrals such as that, especially because they are both unnecessary and arguable in some cases.
Line-by-line comments:
Line 20: “However, despite their increasing complexity…”
This comment is so nit-picky that I feel compelled to apologize for it in advance, but I might argue against the word despite here. While it is undeniable that more complexity has been attempts to add to LSMs, a lot of those processes are intertwined. Hence it would make sense that, at least in the beginning, one would expect to see those major uncertainties remain as they have more room to spread so to speak.
Not a critical comment and feel free to ignore, but still something to potentially rephrase here.
Line 23: “…terrestrial biosphere is becoming a critical scientific priority…”
I would just expand that it is also a policy priority as the results of these models are used to basis for future plans as explained in the example following this part.
Line 28: “DA can be used to improve the initial state of the model and/or the internal model parameters.”
This part, along with the following more detailed examples, are a bit misleading in my opinion as DA is more commonly used to continuously update the state variables as new observations become available. While that new estimated state is then used as the basis for the next projection, it also does contain information from the preceding model states, so calling it the initial state is not completely accurate.
Also on the numerical weather comparison after this part, I again felt it was a bit inaccurate. First of all, in order for the parameter estimation to be even possible, you need known equation to calibrate. So the fact that the equations were known is not an explanation why the parameter estimation isn’t the focus. Furthermore, and more importantly, because of the chaotic nature of weather systems, error related to the current state spreads faster and ends up dominating the future projection error. I would argue that is a more central reason why state data assimilation is used and why with LSMs it makes more sense to focus on model parameter uncertainties.
By the way, this part isn’t to be nit-picky like my first comment, but reading through this part I felt it asserted confusing things such as what kind of an equation exists as one cannot use those methods if one doesn’t have an equation to begin with. So the issue with LSMs, for example, isn’t that they rely on empirical equations, but that there are questions how generally applicable a set of parameters for those empirical equations are and in which situations they should be recalibrated.
Line 32: “Furthermore, we often rely on variational data assimilation methods...”
I strongly disagree with the implication here that variational methods are the common approach in climate study approaches. They are used, obviously, but MCMC based methods are still much more used at least based on my experience. Which is actually my biggest source of confusion with this paper, something I will delve more deeply later, in that for this particular experiment it feels like a comparison of History Matching to a MCMC approach would have been much more fitting due to the uncertainty aspect.
Additionally, the description of the variational method is a bit odd as while technically correct, in 3D variational assimilation the assimilation is done for each observation moment separately. So in those cases it is not really a time window, at least not in the manner indicated here.
Line 50: “- for example, the likelihood…”
I was very confused by the claim here. Yes, by its very nature likelihood will generally be univariate and smooth as it is a continuous variable that has only one maxima/minima. However, it does nothing to address the equifinality issue that is a fundamental challenge to calibration in general, emulators included.
Let us say that we have two parameter sets that produce closely the same likelihood, which in itself would be relatively straight-forward in a complex LSM if we are looking at multiple outputs due to the cost function determining the likelihood. In that case as we approach these parameter sets, the likelihood function will continue to be univariate and smooth in a similar fashion even if driven by two different sets.
This isn’t to argue against using likelihood for the emulator, rather that I don’t understand how it solves any of the issues raised before it?
Line 116: “2.2.2 Variotional data assimilation”
In addition for the VarDA application here seemingly being 4D Var without ever directly identified as that, there is also no discussion at all in the methodology section about the required adjoint version of the model. Which in turn is a bit confusing as at least the gradient based algorithm used for the calibration is based on the assumption that all the information is transferred to the same point of time in order calculate the gradient.
Now I do realize there have been ways to sidestep this in previous publications discussing why the adjoint was not used and even here it is mentioned briefly in the discussion. However, even if choosing such approaches, the methods section should be transparent about that as well as about the reasoning behind the choice. This is especially important here because the previous works that have avoided the adjoint have worked with the simpler system than what is done here. Which naturally raises the question that if some of the challenges the VarDA is having are at least partially due to skipping that part of the process as the required assumptions no longer hold as strongly.
Line 174: “The value of a is often…”
This part was a bit confusing to me as is a set as three here or not? If it is, just state so instead of writing how it is generally done.
Line 201: “…standard deviation set to 0.1 times the time series’ mean.”
Why the time series mean? Especially if you are looking at seasonal variables where the assumption would be that the uncertainty is at least partially relative to the measured valued?
Line 203: “…where prior uncertainty is set to 100 % of the parameter range of variation in order to allow for maximal space exploration.”
I don’t quite understand this as the prior uncertainty is still a normal distribution, correct? So if in this case it is set to 100 %, that would imply the variation in the matrix, which in turn would only cover approximately two thirds of the parameter range with the rest of the uncertainty being beyond that? Reading through the sentence multiple times I think I get what it is stating, in that you are exploring the whole range of allowed parameters, but that is how calibration should work anyway based on prior knowledge, shouldn’t it?
Furthermore, if I am understanding this correct, you are essentially setting the prior uncertainty so high that the prior state doesn’t really affect the calibration? Which again raises the question why even do this experiment with VarDA as you are here handicapping against its strengths?
Line 206: “Here, we focus on the first year…”
You are using data from only one year to calibrate parameters affecting seasonally changing parameter values? Why not use multiple years? Especially since you are running a twin experiment which involves creating a synthetic time series anyway?
Line 241: “3. Results”
I will write a generic comment on this section instead of raising the issue line by line. One can have a “Results and discussions” section or one can have them separately. Here, though, despite there being those individual sections, there are multiple points in the results were amidst presenting the experiment, there are lines which theorize on the meaning of the results. Essentially text that belongs to discussion where it could be contextualized together instead of the scattered approach here.
Thus my suggestion is to going through this whole section and consider moving the lines about the implications of the results to the discussion section. Not only will it make the results section easier to read, it will also allow a more concrete analysis of what can deducted from the experiments.
Line 322: “This is further reduced to less…”
This is a bit unclear as the sentence before states that the cutoff value was reduced by over 80 % during the first pass with the following part being about how it went to less than 10 % after ten iterations. So is the 10 % referenced here from the cutoff value? Also wouldn’t this imply that the benefit of the further iterations was not that great as the first attempt already reduced uncertainty to near the final result?
I realize later on it is stated that after the 5th wave the improvements were marginal, but it is not obvious to me what is used as the standard for that here?
Line 340: “3.2 Implementing process-oriented metrics”
I feel there is a lot of text about implementation and reasoning here that belongs in the methods section.
Line 404: “…we compare it to VarDA typically…”
Again just disagreeing with the argument that VarDA is the typical approach used for model calibration even with LSM.
Line 443: “The example we test here is illustrative, but very simple.”
First, again to be nitpicky, but I think it should be simple, but illustrative.
Second, and more to the actual commentary, I am not quite certain I understand how this is illustrative as the manuscript tests the methodology with data from a single site. So there is not that much yet that can be said about the multisite performance, especially because those have their own challenges. While it is true that they usually address, at least partially, the equifinality issue, they also do lead to wider parameter uncertainty distributions as there are dynamics at various sites that are not explicitly included in the models. Which in turn is a wider question about how would HM perform in those circumstances that the results here do not give insight into.
Line 470: “The true strength of HM is its ability to identify structural issues.”
This is a tricky one. I don’t agree with this, however, at the same time I feel this might be more of a terminological question. For me, structural issues relate more to the actual model equations and included dynamics, which HM in itself is not any better in isolating than any other calibration method. Don’t get me wrong, HM is a useful testing tool in such situation, but I would not that more than anything else as it is still relying on the expertise of the user.
If, though, the structural error here is more in reference in how we set the priors of the parameter values themselves, then there I would agree that that is something where application of HM methods have value.
Citation: https://doi.org/10.5194/egusphere-2023-2996-RC1 - AC2: 'Reply on RC1', Nina Raoult, 29 May 2024
-
RC2: 'Comment on egusphere-2023-2996', Anonymous Referee #2, 14 May 2024
This study explores the use of history matching and the not-ruled-out-yet (NROY) parameter space to calibrate land surface model parameters and quantify uncertainty on model outputs. The method is demonstrated in a twin experiment with the ORCHIDEE model (i.e. using model-generated "observations" with known parameters), and compared to parameter optimisation and uncertainty characterisation by a gradient-based method and a global-search method.
The paper is very clearly written, and provides a valuable demonstration of the history matching / NROY method. The comparison with the gradient-based and global search methods is an important part of the paper, as these methods have often been used to optimise parameters and characterise model uncertainty in land surface models. The comparison of the uncertainty range from the ensemble of 200 optimisations with Bpost and with the HM range is also good to see. I like the exploration of different metrics. The results are significant and I recommend publication of the paper with minor revision to address the comments below.
Specific comments:
Line 11 - "the true parameters are contained in the posterior distribution" - is this guaranteed with history matching, or do you find that it occurs in this example and could there be cases where it doesn't?
Line 73 - VarDA (and the term variational) is used here to include both the gradient-based method and the genetic algorithm. I am used to using the term 'variational' to describe gradient-based methods, in contrast to terms like Monte Carlo, global search, or stochastic to describe a genetic algorithm. I wasn't able to find a definitive definition of variational, and it seems to be used differently by different authors (e.g. Santos et al. (2013, doi: 10.1590/S2179-84512013005000012) contrasting variational methods and genetic algorithms, and Schmehl et al (2011, doi: 10.1007/s00024-011-0385-0) describing a genetic algorithm variational approach). Nonetheless, I think it is worth considering whether a different term would be better to describe the two parameter optimisation methods (e.g. ParOpt for Parameter Optimisation, or ParEst or PE for Parameter Estimation) to avoid any possible confusion.
Line 156 - this equation assumes σi is the same error for all observations in each stream
Line 165 - define E
Figure 1 - I don't understand the text between the first pink shape and the first purple shape "For 1st wave χ=χNROY". I would understand it if it defined the first wave χNROY=χ.
Line 225 - write 10,000 rather than 1e4
Line 313 - Q10 is the most constrained parameter *relative to the prior range*.
Line 340 - could remind the reader here that 200 GA optimisations were used e.g. "Instead, multiple GA optimisations were preferable (we used 200), which is extremely costly."
Figure 7 - Is Min/Max the quotient of the min and max of the data? Please define exactly what this metric is. What is the number beside the panel caption (i.e. 0.35 beside Min/Max, 0.06 beside Spring gradient etc)? Vertical gray lines could be added to the timeseries for b) at Feb and Apr and c) at Aug and Sep to point out the months used in the metrics. The constraint of initial carbon stocks, is that something that is often observed?
Line 383 - be consistent in using min/max or amplitude to describe that metric.
Line 383 - Do you need to weight the different metrics when they are combined in HM?
Is there noise added to the observations used for the alternative metrics? In a twin experiment, without noise on the observations, it is easy to see how the parameters could be much better constrained than in a realistic case with actual observations.
Line 459 - In the context of land surface models, a stepwise approach to separate calibration of the fast and slow processes (as described at line 85) would benefit from this feature of the HM.
In contrast with the RC1 comment, I do believe that the VarDA part of the paper is important, as it reflects the way parameters are often optimised and uncertainties quantified in land surface models. Personally, I like the style of discussing the meaning of some of the results given in the Results section, rather than leaving all of that discussion to the Discussion section, but I guess that is a matter of style.
Citation: https://doi.org/10.5194/egusphere-2023-2996-RC2 - AC1: 'Reply on RC2', Nina Raoult, 29 May 2024
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-2996', Toni Viskari, 15 Mar 2024
This is a review for the manuscript “Exploring the Potential of History Matching for Land Surface Model Calibration” submitted to Geoscientific Model Development by Raoult et al. In the work included in the manuscript, the authors examine the benefits of using History Matching (HM) for Land Surface Model (LSM) calibration with the ORCHIDEE LSM model by conducting a twin experiment reflecting a site often used for ORCHIDEE calibration. In addition they compare the HM results with how VarDA calibration performance.
From the HM implementation/experiment part, I thought the manuscript was well done and successful. It is a basic what is being done here, but the paper is open about that and establishes itself more as a foundation work that will be expanded upon in the future. There are some parts where some clarification is needed, but overall I was quite satisfied with the work done and how it is presented here.
Which, however, brings us to VarDA comparison as I did not really comprehend the purpose of it. As the manuscript even itself admits in the discussion section, how those two calibration methods function are so fundamentally different that the results really can’t be compared effectively. VarDA is ultimately an optimization method, especially when applied in the manner here, and even the uncertainty approximation used in the work is kind of a fix as the approach itself does not actually produce uncertainties. That has been one of the great challenges in using 4D-Var in forecast service and has produced several different workarounds.
So to use VarDA, which essentially was 4D-var here, to estimate parameter values and then compare those with a method that only estimates uncertainties was odd, to be honest. Furthermore, the more commonly used MCMC methods actually produce uncertainty estimates that would be a more apt comparison for the HM results. While the discussion does touch on why MCMC wasn’t used here, it should have been done more explicitly earlier and even then it remains questionable. In truth, it almost felt like the author was worried that the paper itself would have been too short or limited without the comparison, but it doesn’t really add that much.
Apologies on the brusqueness of the feedback regarding the comparison, but it is driven by how much I did appreciate the HM part of the work. If this paper was still in the preparation phase, I would strongly argue removing the comparison, but at this stage that is too drastic an action to take. And while I continue to have my issues with the VarDA part, I do think the rest of the paper is strong enough to be considered for publication.
Thus my recommendation is to return for major revision with the focus, in addition to the detailed comments later on, being on strengthening the reasoning and expectations for the VarDA comparison both in the introduction and in the methods section.
Below are my line-by-line comments for the manuscript. As a general comment, though, there are several points where the manuscript refers to something as usually or commonly done which I feel should almost all be removed. Just state what is done and why instead of using generic referrals such as that, especially because they are both unnecessary and arguable in some cases.
Line-by-line comments:
Line 20: “However, despite their increasing complexity…”
This comment is so nit-picky that I feel compelled to apologize for it in advance, but I might argue against the word despite here. While it is undeniable that more complexity has been attempts to add to LSMs, a lot of those processes are intertwined. Hence it would make sense that, at least in the beginning, one would expect to see those major uncertainties remain as they have more room to spread so to speak.
Not a critical comment and feel free to ignore, but still something to potentially rephrase here.
Line 23: “…terrestrial biosphere is becoming a critical scientific priority…”
I would just expand that it is also a policy priority as the results of these models are used to basis for future plans as explained in the example following this part.
Line 28: “DA can be used to improve the initial state of the model and/or the internal model parameters.”
This part, along with the following more detailed examples, are a bit misleading in my opinion as DA is more commonly used to continuously update the state variables as new observations become available. While that new estimated state is then used as the basis for the next projection, it also does contain information from the preceding model states, so calling it the initial state is not completely accurate.
Also on the numerical weather comparison after this part, I again felt it was a bit inaccurate. First of all, in order for the parameter estimation to be even possible, you need known equation to calibrate. So the fact that the equations were known is not an explanation why the parameter estimation isn’t the focus. Furthermore, and more importantly, because of the chaotic nature of weather systems, error related to the current state spreads faster and ends up dominating the future projection error. I would argue that is a more central reason why state data assimilation is used and why with LSMs it makes more sense to focus on model parameter uncertainties.
By the way, this part isn’t to be nit-picky like my first comment, but reading through this part I felt it asserted confusing things such as what kind of an equation exists as one cannot use those methods if one doesn’t have an equation to begin with. So the issue with LSMs, for example, isn’t that they rely on empirical equations, but that there are questions how generally applicable a set of parameters for those empirical equations are and in which situations they should be recalibrated.
Line 32: “Furthermore, we often rely on variational data assimilation methods...”
I strongly disagree with the implication here that variational methods are the common approach in climate study approaches. They are used, obviously, but MCMC based methods are still much more used at least based on my experience. Which is actually my biggest source of confusion with this paper, something I will delve more deeply later, in that for this particular experiment it feels like a comparison of History Matching to a MCMC approach would have been much more fitting due to the uncertainty aspect.
Additionally, the description of the variational method is a bit odd as while technically correct, in 3D variational assimilation the assimilation is done for each observation moment separately. So in those cases it is not really a time window, at least not in the manner indicated here.
Line 50: “- for example, the likelihood…”
I was very confused by the claim here. Yes, by its very nature likelihood will generally be univariate and smooth as it is a continuous variable that has only one maxima/minima. However, it does nothing to address the equifinality issue that is a fundamental challenge to calibration in general, emulators included.
Let us say that we have two parameter sets that produce closely the same likelihood, which in itself would be relatively straight-forward in a complex LSM if we are looking at multiple outputs due to the cost function determining the likelihood. In that case as we approach these parameter sets, the likelihood function will continue to be univariate and smooth in a similar fashion even if driven by two different sets.
This isn’t to argue against using likelihood for the emulator, rather that I don’t understand how it solves any of the issues raised before it?
Line 116: “2.2.2 Variotional data assimilation”
In addition for the VarDA application here seemingly being 4D Var without ever directly identified as that, there is also no discussion at all in the methodology section about the required adjoint version of the model. Which in turn is a bit confusing as at least the gradient based algorithm used for the calibration is based on the assumption that all the information is transferred to the same point of time in order calculate the gradient.
Now I do realize there have been ways to sidestep this in previous publications discussing why the adjoint was not used and even here it is mentioned briefly in the discussion. However, even if choosing such approaches, the methods section should be transparent about that as well as about the reasoning behind the choice. This is especially important here because the previous works that have avoided the adjoint have worked with the simpler system than what is done here. Which naturally raises the question that if some of the challenges the VarDA is having are at least partially due to skipping that part of the process as the required assumptions no longer hold as strongly.
Line 174: “The value of a is often…”
This part was a bit confusing to me as is a set as three here or not? If it is, just state so instead of writing how it is generally done.
Line 201: “…standard deviation set to 0.1 times the time series’ mean.”
Why the time series mean? Especially if you are looking at seasonal variables where the assumption would be that the uncertainty is at least partially relative to the measured valued?
Line 203: “…where prior uncertainty is set to 100 % of the parameter range of variation in order to allow for maximal space exploration.”
I don’t quite understand this as the prior uncertainty is still a normal distribution, correct? So if in this case it is set to 100 %, that would imply the variation in the matrix, which in turn would only cover approximately two thirds of the parameter range with the rest of the uncertainty being beyond that? Reading through the sentence multiple times I think I get what it is stating, in that you are exploring the whole range of allowed parameters, but that is how calibration should work anyway based on prior knowledge, shouldn’t it?
Furthermore, if I am understanding this correct, you are essentially setting the prior uncertainty so high that the prior state doesn’t really affect the calibration? Which again raises the question why even do this experiment with VarDA as you are here handicapping against its strengths?
Line 206: “Here, we focus on the first year…”
You are using data from only one year to calibrate parameters affecting seasonally changing parameter values? Why not use multiple years? Especially since you are running a twin experiment which involves creating a synthetic time series anyway?
Line 241: “3. Results”
I will write a generic comment on this section instead of raising the issue line by line. One can have a “Results and discussions” section or one can have them separately. Here, though, despite there being those individual sections, there are multiple points in the results were amidst presenting the experiment, there are lines which theorize on the meaning of the results. Essentially text that belongs to discussion where it could be contextualized together instead of the scattered approach here.
Thus my suggestion is to going through this whole section and consider moving the lines about the implications of the results to the discussion section. Not only will it make the results section easier to read, it will also allow a more concrete analysis of what can deducted from the experiments.
Line 322: “This is further reduced to less…”
This is a bit unclear as the sentence before states that the cutoff value was reduced by over 80 % during the first pass with the following part being about how it went to less than 10 % after ten iterations. So is the 10 % referenced here from the cutoff value? Also wouldn’t this imply that the benefit of the further iterations was not that great as the first attempt already reduced uncertainty to near the final result?
I realize later on it is stated that after the 5th wave the improvements were marginal, but it is not obvious to me what is used as the standard for that here?
Line 340: “3.2 Implementing process-oriented metrics”
I feel there is a lot of text about implementation and reasoning here that belongs in the methods section.
Line 404: “…we compare it to VarDA typically…”
Again just disagreeing with the argument that VarDA is the typical approach used for model calibration even with LSM.
Line 443: “The example we test here is illustrative, but very simple.”
First, again to be nitpicky, but I think it should be simple, but illustrative.
Second, and more to the actual commentary, I am not quite certain I understand how this is illustrative as the manuscript tests the methodology with data from a single site. So there is not that much yet that can be said about the multisite performance, especially because those have their own challenges. While it is true that they usually address, at least partially, the equifinality issue, they also do lead to wider parameter uncertainty distributions as there are dynamics at various sites that are not explicitly included in the models. Which in turn is a wider question about how would HM perform in those circumstances that the results here do not give insight into.
Line 470: “The true strength of HM is its ability to identify structural issues.”
This is a tricky one. I don’t agree with this, however, at the same time I feel this might be more of a terminological question. For me, structural issues relate more to the actual model equations and included dynamics, which HM in itself is not any better in isolating than any other calibration method. Don’t get me wrong, HM is a useful testing tool in such situation, but I would not that more than anything else as it is still relying on the expertise of the user.
If, though, the structural error here is more in reference in how we set the priors of the parameter values themselves, then there I would agree that that is something where application of HM methods have value.
Citation: https://doi.org/10.5194/egusphere-2023-2996-RC1 - AC2: 'Reply on RC1', Nina Raoult, 29 May 2024
-
RC2: 'Comment on egusphere-2023-2996', Anonymous Referee #2, 14 May 2024
This study explores the use of history matching and the not-ruled-out-yet (NROY) parameter space to calibrate land surface model parameters and quantify uncertainty on model outputs. The method is demonstrated in a twin experiment with the ORCHIDEE model (i.e. using model-generated "observations" with known parameters), and compared to parameter optimisation and uncertainty characterisation by a gradient-based method and a global-search method.
The paper is very clearly written, and provides a valuable demonstration of the history matching / NROY method. The comparison with the gradient-based and global search methods is an important part of the paper, as these methods have often been used to optimise parameters and characterise model uncertainty in land surface models. The comparison of the uncertainty range from the ensemble of 200 optimisations with Bpost and with the HM range is also good to see. I like the exploration of different metrics. The results are significant and I recommend publication of the paper with minor revision to address the comments below.
Specific comments:
Line 11 - "the true parameters are contained in the posterior distribution" - is this guaranteed with history matching, or do you find that it occurs in this example and could there be cases where it doesn't?
Line 73 - VarDA (and the term variational) is used here to include both the gradient-based method and the genetic algorithm. I am used to using the term 'variational' to describe gradient-based methods, in contrast to terms like Monte Carlo, global search, or stochastic to describe a genetic algorithm. I wasn't able to find a definitive definition of variational, and it seems to be used differently by different authors (e.g. Santos et al. (2013, doi: 10.1590/S2179-84512013005000012) contrasting variational methods and genetic algorithms, and Schmehl et al (2011, doi: 10.1007/s00024-011-0385-0) describing a genetic algorithm variational approach). Nonetheless, I think it is worth considering whether a different term would be better to describe the two parameter optimisation methods (e.g. ParOpt for Parameter Optimisation, or ParEst or PE for Parameter Estimation) to avoid any possible confusion.
Line 156 - this equation assumes σi is the same error for all observations in each stream
Line 165 - define E
Figure 1 - I don't understand the text between the first pink shape and the first purple shape "For 1st wave χ=χNROY". I would understand it if it defined the first wave χNROY=χ.
Line 225 - write 10,000 rather than 1e4
Line 313 - Q10 is the most constrained parameter *relative to the prior range*.
Line 340 - could remind the reader here that 200 GA optimisations were used e.g. "Instead, multiple GA optimisations were preferable (we used 200), which is extremely costly."
Figure 7 - Is Min/Max the quotient of the min and max of the data? Please define exactly what this metric is. What is the number beside the panel caption (i.e. 0.35 beside Min/Max, 0.06 beside Spring gradient etc)? Vertical gray lines could be added to the timeseries for b) at Feb and Apr and c) at Aug and Sep to point out the months used in the metrics. The constraint of initial carbon stocks, is that something that is often observed?
Line 383 - be consistent in using min/max or amplitude to describe that metric.
Line 383 - Do you need to weight the different metrics when they are combined in HM?
Is there noise added to the observations used for the alternative metrics? In a twin experiment, without noise on the observations, it is easy to see how the parameters could be much better constrained than in a realistic case with actual observations.
Line 459 - In the context of land surface models, a stepwise approach to separate calibration of the fast and slow processes (as described at line 85) would benefit from this feature of the HM.
In contrast with the RC1 comment, I do believe that the VarDA part of the paper is important, as it reflects the way parameters are often optimised and uncertainties quantified in land surface models. Personally, I like the style of discussing the meaning of some of the results given in the Results section, rather than leaving all of that discussion to the Discussion section, but I guess that is a matter of style.
Citation: https://doi.org/10.5194/egusphere-2023-2996-RC2 - AC1: 'Reply on RC2', Nina Raoult, 29 May 2024
Peer review completion
Post-review adjustments
Journal article(s) based on this preprint
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
372 | 160 | 28 | 560 | 41 | 18 |
- HTML: 372
- PDF: 160
- XML: 28
- Total: 560
- BibTeX: 41
- EndNote: 18
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Simon Beylat
James M. Salter
Frédéric Hourdin
Vladislav Bastrikov
Catherine Ottlé
Philippe Peylin
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(10545 KB) - Metadata XML