Mapping soil moisture across the UK: assimilating cosmic-ray neutron sensors, remotely-sensed indices, rainfall radar and catchment water balance data in a Bayesian hierarchical model

Levy, Peter E.; the COSMOS-UK team,

doi:https://doi.org/10.5194/egusphere-2023-2041

Preprints

https://doi.org/10.5194/egusphere-2023-2041

Preprints

12 Sep 2023

| 12 Sep 2023

Mapping soil moisture across the UK: assimilating cosmic-ray neutron sensors, remotely-sensed indices, rainfall radar and catchment water balance data in a Bayesian hierarchical model

Peter E. Levy and the COSMOS-UK team

Abstract. Soil moisture is important in many hydrological and ecological processes. However, data sets which are currently available have issues with accuracy and resolution. To translate remotely-sensed data to an absolute measure of soil moisture requires mapped estimates of soil hydrological properties and estimates of vegetation properties, and this introduces considerable uncertainty. We present an alternative methodology for producing daily maps of soil moisture over the UK at 2-km resolution ("SMUK"). The method is based on a simple empirical model, calibrated with five years of daily data from cosmic-ray neutron sensors at ~40 sites across the country. The model is driven by precipitation, humidity, a remotely-sensed "soil water index" satellite product, and soil porosity. The model explains around 70 % of the variance in the daily observations. The spatial variation in the parameter describing the soil water retention (and thereby the response to precipitation) was estimated using daily water balance data from ~1200 catchments with good coverage across the country. The model parameters were estimated by Bayesian calibration using a Markov chain Monte Carlo method, so as to characterise the posterior uncertainty in the parameters and predictions. We found that the simple model could emulate the behaviour of a more complex process-based model. Given the high resolution of the inputs in time and space, the model can predict the very detailed variation in soil moisture which arises from the sporadic nature of precipitation events, including the small-scale and short-term variations associated with orographic and convective rainfall. Predictions over the period 2016 to 2023 demonstrated realistic patterns following the passage of weather fronts and prolonged droughts. The model has negligible computation time, and inputs and predictions are updated daily, lagging approximately one week behind real time.

Received: 05 Sep 2023 – Discussion started: 12 Sep 2023

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 14030 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (14030 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

06 Nov 2024

Mapping soil moisture across the UK: assimilating cosmic-ray neutron sensors, remotely sensed indices, rainfall radar and catchment water balance data in a Bayesian hierarchical model

Peter E. Levy and the COSMOS-UK team

Hydrol. Earth Syst. Sci., 28, 4819–4836, https://doi.org/10.5194/hess-28-4819-2024,https://doi.org/10.5194/hess-28-4819-2024, 2024

Short summary

Peter E. Levy and the COSMOS-UK team

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-2041', Anonymous Referee #1, 27 Nov 2023

The authors have done a job on high-resolution soil moisture modeling at the UK scale. The paper is well structured, but a major revision is needed before publication. My main issues include:

1. To add a flowchart that systematically shows the various parts of the study and the roles of the various data.

2. To add a description of the matching of COSMOS sites to model grids. It is not clear at this point how to match COSMOS data at nearly 100m resolution with models at 2km resolution.

3. As the authors said, they used decades of stream flow data. Have these watersheds changed over the last few decades? In particular, are there any hydraulic structures or water extraction projects conducted during this period? How would these decades of river flow data affect the results of this study if they are unsteady?
4. The information presented in Fig.3 is not clear, please revise it. Please add the corresponding rainfall. Please show the soil moisture of one or two months in different seasons.

Citation: https://doi.org/10.5194/egusphere-2023-2041-RC1
- AC1: 'Reply on RC1', Peter E. Levy, 08 Feb 2024
  
  We thank the referee for the time taken. Their comments are shown in italics; our response is beneath in normal font.
  My main issues include:
  1. To add a flowchart that systematically shows the various parts of the study and the roles of the various data.
  
  - A good idea - we will add this in the revision.
  
  2. To add a description of the matching of COSMOS sites to model grids. It is not clear at this point how to match COSMOS data at nearly 100m resolution with models at 2km resolution.
  - This is straightforward because the COSMOS sites are simply matched to the 2-km square they are located in. We can state this in the revision, and discuss other options (e.g. using data from the surrounding grid cells to interpolate to the COSMOS site location).
  
  3. As the authors said, they used decades of stream flow data. Have these watersheds changed over the last few decades? In particular, are there any hydraulic structures or water extraction projects conducted during this period? How would these decades of river flow data affect the results of this study if they are unsteady?
  - Where these do occur, it would indeed make a step change of unknown size in the parameters we are estimating. The NRFA data include meta-data on any known man-made changes of this kind, and we have tried to remove data prior to these changes where they have occurred. However, of the 1200+ catchments, this affects relatively few, most of which have been identified and removed, so we do not think this is a major problem with the analysis. We can add text to this effect in the revision to the manuscript.
  
  4. The information presented in Fig.3 is not clear, please revise it. Please add the corresponding rainfall. Please show the soil moisture of one or two months in different seasons.
  
  - Adding rainfall to the existing figure is straightforward. Showing some contrasting months is also easy, but will require a separate figure. We can do both in the revisions.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2041-AC1
RC2:
'Comment on egusphere-2023-2041', Anonymous Referee #2, 10 Dec 2023
The present manuscript aims at predicting soil moisture for the whole UK using a new hydrological model approach based on statistical considerations and data from discharge gauges, remote-sensing products, and cosmic-ray neutron sites. The authors explain the mathematical background of their model in detail and extensively discuss parts of the used data, their results, and the model limitations. To me the introduction of the mathematical approach reads interesting, but it seems to combine a lot of new concepts and ideas, such as an EMA filter, a slope m, a mixed effects model, complex kriging algorithms, bayesian statistics, etc. It is not clear to me to what extend this is all new or already established. It is also not clear to me how these ideas are backed by previous research. If the approach is completely new, I would relate this manuscript more to a journal for hydrological model development. The key is the invention of an (apparently) new model approach, while the use of the highly advertised COSMOS data here turned out to be just a very minor aspect of the study. Being not a hydrological modeler, I cannot evaluate the choices the authors made on the way, but I feel that comparisons to existing models are widely missing. Once the model development is accepted by the hydrological modeler community, a second paper could integrate new data sets, such as COSMOS-UK, to study its performance. Hence, I'd recommend major revision to better focus on the model development, comparisons to existing models, and to address the remaining concerns.
# Major concerns
As a key motivation for inventing a completely new hydrological model, I am missing an extensive introduction of existing hydrological models, their methods and capabilities to predict spatial SM in the UK, where and why they fail, and what will be done differently in this study to solve these issues. Has nobody before operated a hydrological model in the UK? What is their resolution? Has nobody before integrated discharge data? Or satellite data? Or CRNS data? There is plenty of literature here that needs to be discussed before it becomes clear whether you actually invented a completely new approach or took or ammended parts of existing ones. And whether this choice is adequate compared to the performance of existing models.

The authors present their "simple" model with a number of unclear assumptions (Lines 58-70). E.g., treating soil moisture dynamics as a pulse-decay curve with exponential shape. I have strong doubts that this is a valid assumption for soil hydrological processes, neglecting porosity, capillary forces, van.Genuchten models, vegetation influence, etc. If the authors are really convinced about their assumptions here, the reader would at least expect scientific argumentation of why these assumptions hold, e.g., using insights from existing literature. The whole section hardly names any hydrological paper to strengthen the choice of assumptions, which would be OK for the first hydro model invented in 1950, but not in 2023.

A major challenge when comparing soil moisture from hydrological models and COSMOS data is the vertical soil moisture profile. COSMOS averages soil moisture between 0 and 80 cm, with an exponential weight which is higher for shallower layers and that depends (unfortunatelly) on the soil moisture profile itself. It changes over time. And it is not trivial to what layer of the hydrological model these measurements should be compared to, and how. Many other papers have addressed this challenge already. While in the present paper, I cannot find any hint on how exactly the authors compared observed and predicated soil moisture layer-wise. Please elaborate.

The agreement between observed and predicted soil moisture does not look convincing to me (Fig. 3 and 4). There are obvious biases and unmatched dynamics still visible. Performance metrics like KGE or R² are missing to assess the qualitiy of the prediction. The RMSE alone could miss important differences in dynamics.

I wonder whether the performance of the model has been tested on uncalibrated sites. A usual approach to test spatial extrapolation or regionalization models is to train them on a few sites and test them on other sites. Please add such an analysis such that the reader can assess the reliability of your high-resolution model at sites other than the COSMOS sites.

The major selling point of the new model seems to be computational speed (Line 381). However, there are other hydrological models which are also based on simple principles, physical parameters, and still extremely fast. One of many examples could be the mHM model (Samaniego et al. 2010), proofed to be one of the best hydro models globally. A major difference is that they regionalize the calculation of soil porosity, while your model takes a given map for granted. It would be important to highlight the differences to this and other existing models in terms of methodology, speed, and quality of results.

# Minor concerns
The structure of the introduction is unconventional and confusing. It appears that the introduction has not ended before section 1.1, but the subsequent description of the hydro model used seems also be part of the introduction, too. After that, the aims of the study are outlined two pages later. This is highly confusing and should be changed. Section 1.1., and maybe parts of 1.2, should move to the methods section. Please elaborate on the structure and outline of the study at the end of the introduction. I was not able to identify a clear hypothesis, other than making "the most accurate estimate of mapped soil moisture as possible", which is both vague and nonscientific language.

The introduction seems to be a bit biased, as no issues of the CRNS technique have been addressed, while many issues of remote sensing products are prominently mentioned. Especially since the argumentation focuses towards the unwanted influence of vegetation water and soil properties, it is necessary to indicate that CRNS has very similar issues, as it does not work reliably in highly vegetated, highly prorous, or highly organic soils (Bogena et al. 2013, Rasche et al. 2021, etc).

Section 2.1.1.: A proper and unbiased introduction of the COSMOS technique, which is, as was advertised, key to this study, requires more description of the pros and cons. In that sense, the description is actually incomplete. Neutrons are not only sensitive to soil moisture, but to any hydrogen pool in organic matter, vegetation, snow, etc. This is a highly relevant information to assess the performance and quality of your results. Also the fact that COSMOS data is calibrated on actual soil moisture is very relevant, because neutrons are a relative quantity just as the remote sensing data you critisize. Furthermore, Köhli et al. speaks of 15 to 80 cm of sensing depth, why do you mention max. 30 cm depth here? The answer is the wet soil in UK, which brings us back to the fact that limitations of COSMOS have not been properly explained here. Please elaborate on the quality of the CRNS data and provide related citations.

# Specific comments
## Abstract:
The abstract is not logical or at least unclear. You motivate your study by the fact that remote-sensing data, soil hydrological data and vegetation introduce uncertainty. Then you present a solution which involves a remote-sensing product and soil properties. The reader would expect a brief argumentation why this solution solves the previously mentioned issues while it again makes use of them.

The study was further motivated with the fact that remote sensing data have issues to provide absolute soil moisture. The solution presented, however, seems to be good at explaining variation only, with no mention of absolute SM predictions anymore (at least in the abstract). If you raise an issue in the beginning, the reader would expect a reference to it at the end of the story. 

Please use scientific and more concrete language when describing the models used. A "simple model", as the major outcome of your study, is not an adequate description. Can you name it? Is it a statistical or bucket model? Help the reader to categorize the key model of your study among the many existing model variants in hydrology. Similarly, please name or briefly elaborate on "a process-based model" which you mentioned using as a benchmark.

The last sentence does not make sense to me. If there is neglible computation time and assimilation of realtime data, why it lacks behind one week? 

## Manuscript
Line 26: Consider mentioning also the useful integration depth of this measurement technique. 

Line 33: Can you assign the individual citations to each problem separately, instead of lumping them all at the end of the sentence? Thanks!

Line 36: replace "are" by "and" (...influenced)

Line 295: "there is no clear pattern to it". Please rephrase. The interpretation of the pattern is scientific research. Just because no reason for the variations has been identified so far, it does not mean that there is no reason or no underlying pattern at all.

Code availability: it is highly recommended to publish the model code, e.g. in a git repository, as it is common standard for other hydrological models.
Citation: https://doi.org/10.5194/egusphere-2023-2041-RC2
- AC2: 'Reply on RC2', Peter E. Levy, 08 Feb 2024
  
  We thank the referee for the time taken and attention to detail. Their comments are shown in italics; our response is beneath in normal font.
  I'd recommend major revision to better focus on the model development, comparisons to existing models, and to address the remaining concerns.
  # Major concerns
  
  1. As a key motivation for inventing a completely new hydrological model, I am missing an extensive introduction of existing hydrological models, their methods and capabilities to predict spatial SM in the UK, where and why they fail, and what will be done differently in this study to solve these issues. Has nobody before operated a hydrological model in the UK? What is their resolution? Has nobody before integrated discharge data? Or satellite data? Or CRNS data? There is plenty of literature here that needs to be discussed before it becomes clear whether you actually invented a completely new approach or took or ammended parts of existing ones. And whether this choice is adequate compared to the performance of existing models.
  
  - We accept this point, and can add some text to the introduction on existing soil moisture products.
  2. The authors present their "simple" model with a number of unclear assumptions (Lines 58-70). E.g., treating soil moisture dynamics as a pulse-decay curve with exponential shape. I have strong doubts that this is a valid assumption for soil hydrological processes, neglecting porosity, capillary forces, van.Genuchten models, vegetation influence, etc. If the authors are really convinced about their assumptions here, the reader would at least expect scientific argumentation of why these assumptions hold, e.g., using insights from existing literature. The whole section hardly names any hydrological paper to strengthen the choice of assumptions, which would be OK for the first hydro model invented in 1950, but not in 2023.
  
  - We find this a strange comment. The assumptions are explicit in these lines and in the equations, as well as in the referee's comment itself. We are not "neglecting porosity, capillary forces ..." but demonstrating that they do not need to be represented explicitly: at a given site, the dynamics can be summarised very simply as exponential decay, and thereby linearised via the EMA filter. We cite three hydrological papers which have used the same approach successfully. We could add a section which demonstrates how this follows from first principles, but we thought this would be over-kill. We could add this in supplementary information perhaps.
  3. A major challenge when comparing soil moisture from hydrological models and COSMOS data is the vertical soil moisture profile. COSMOS averages soil moisture between 0 and 80 cm, with an exponential weight which is higher for shallower layers and that depends (unfortunatelly) on the soil moisture profile itself. It changes over time. And it is not trivial to what layer of the hydrological model these measurements should be compared to, and how. Many other papers have addressed this challenge already. While in the present paper, I cannot find any hint on how exactly the authors compared observed and predicated soil moisture layer-wise. Please elaborate.
  
  - We explicitly state that we are modelling the COSMOS observations of soil moisture, which can be interpreted loosely as near-surface soil moisture. At no point do we say that there are any "layers of the hydrological model", and the equations are explicit, so I'm not clear where the confusion arises. As the referee says, the depth that CRNS are sensitive to varies somewhat with soil moisture itself, but are always strongly weighted towards the surface soil moisture. We can make this point explicitly in the revision - that the observations (and thus predictions) are subject to this varying-depth effect, and there is no simple solution to this. One could attempt an inverse modelling scheme to infer a depth profile of soil moisture, but this would be very poorly constrained by the available observations.
  4. The agreement between observed and predicted soil moisture does not look convincing to me (Fig. 3 and 4). There are obvious biases and unmatched dynamics still visible. Performance metrics like KGE or R² are missing to assess the qualitiy of the prediction. The RMSE alone could miss important differences in dynamics.
  
  - r2 for every model variant is listed in Table 1, along with AIC as the more useful measure of comparative goodness-of-fit. Sure, the agreement is not perfect, but the point is that the simple linear model does better than the previous satellite estimates and the more complex models cited.
  5. I wonder whether the performance of the model has been tested on uncalibrated sites. A usual approach to test spatial extrapolation or regionalization models is to train them on a few sites and test them on other sites. Please add such an analysis such that the reader can assess the reliability of your high-resolution model at sites other than the COSMOS sites.
  
  - We are not averse to adding cross-validation in principle, but it doesn't achieve anything additional. The point of the hierarchical approach is that it treats the site-to-site variability explicitly, and estimates the global parameters having accounted for this. So in principle, we can already say how well we expect the model to do at a new site, since we have estimated the variance Ψ.
  
  One real advantage of this approach is that we can propagate this uncertainty that we know will arise at each new site into the predictions. Cross-validation is a more computationally intensive way to quantify that same site-to-site uncertainty, but does not provide an easy means of propagating that uncertainty into predictions. The strength of AIC is that, in theory, it provides a measure of out-of-sample prediction, so indicates which model should give the best prediction at sites outwith the calibration set.
  
  We propose to add some text making the above point to the revision, explaining how this method compares to cross-validation.
  6. The major selling point of the new model seems to be computational speed (Line 381). However, there are other hydrological models which are also based on simple principles, physical parameters, and still extremely fast. One of many examples could be the mHM model (Samaniego et al. 2010), proofed to be one of the best hydro models globally. A major difference is that they regionalize the calculation of soil porosity, while your model takes a given map for granted. It would be important to highlight the differences to this and other existing models in terms of methodology, speed, and quality of results.
  
  - We can add some comparison with other modelling approaches to the introduction and/or discussion. One obvious difference with the MHM is the degree of complexity, since it is a system of ODEs with at least 62 parameters to be estimated, rather than a single linear equation with six parameters (Eqn 4). As an aside, the MHM paper referred to appears to do something similar to the method we describe here, albeit using very different terminology (e.g. "regionalisation").
  # Minor concerns
  
  1. The structure of the introduction is unconventional and confusing. It appears that the introduction has not ended before section 1.1, but the subsequent description of the hydro model used seems also be part of the introduction, too. After that, the aims of the study are outlined two pages later. This is highly confusing and should be changed. Section 1.1., and maybe parts of 1.2, should move to the methods section. Please elaborate on the structure and outline of the study at the end of the introduction. I was not able to identify a clear hypothesis, other than making "the most accurate estimate of mapped soil moisture as possible", which is both vague and nonscientific language.
  
  - By contrast, referee 1 says "the paper is well structured". We explain the problem, then introduce our approach to modelling soil moisture in time (1.1) and in space (1.2), and give explicit aims (1.3). The aims only make sense in terms of the problem we are trying to solve (making accurate maps of soil moisture) and our approach to solving it (integrating disparate data sources in a hierarchical linear model), so inevitably appear later. We are not testing any hypothesis here because we are not doing an experiment. There is nothing "vague and nonscientific" about our stated aims.
  2. The introduction seems to be a bit biased, as no issues of the CRNS technique have been addressed, while many issues of remote sensing products are prominently mentioned. Especially since the argumentation focuses towards the unwanted influence of vegetation water and soil properties, it is necessary to indicate that CRNS has very similar issues, as it does not work reliably in highly vegetated, highly prorous, or highly organic soils (Bogena et al. 2013, Rasche et al. 2021, etc).
  
  - We accept this point. We will add some text to give better balance as the referee suggests.
  3. Section 2.1.1.: A proper and unbiased introduction of the COSMOS technique, which is, as was advertised, key to this study, requires more description of the pros and cons. In that sense, the description is actually incomplete. Neutrons are not only sensitive to soil moisture, but to any hydrogen pool in organic matter, vegetation, snow, etc. This is a highly relevant information to assess the performance and quality of your results. Also the fact that COSMOS data is calibrated on actual soil moisture is very relevant, because neutrons are a relative quantity just as the remote sensing data you critisize. Furthermore, Köhli et al. speaks of 15 to 80 cm of sensing depth, why do you mention max. 30 cm depth here? The answer is the wet soil in UK, which brings us back to the fact that limitations of COSMOS have not been properly explained here. Please elaborate on the quality of the CRNS data and provide related citations.
  
  - Same point as #2 above. We will some text to give better balance as the referee suggests.
  # Specific comments
  
  ## Abstract:
  1. The abstract is not logical or at least unclear. You motivate your study by the fact that remote-sensing data, soil hydrological data and vegetation introduce uncertainty. Then you present a solution which involves a remote-sensing product and soil properties. The reader would expect a brief argumentation why this solution solves the previously mentioned issues while it again makes use of them.
  
  - The point we failed to make was that our method reduces uncertainty by integrating multiple data sources, all of which have weaknesses, but together act as a better constraint on the true soil moisture. We will add text to this effect.
  2. The study was further motivated with the fact that remote sensing data have issues to provide absolute soil moisture. The solution presented, however, seems to be good at explaining variation only, with no mention of absolute SM predictions anymore (at least in the abstract). If you raise an issue in the beginning, the reader would expect a reference to it at the end of the story. 
  
  - We accept this point, will clarify this in the revision.
  3. Please use scientific and more concrete language when describing the models used. A "simple model", as the major outcome of your study, is not an adequate description. Can you name it? Is it a statistical or bucket model? Help the reader to categorize the key model of your study among the many existing model variants in hydrology. Similarly, please name or briefly elaborate on "a process-based model" which you mentioned using as a benchmark.
  
  - We will substitute with "linear model", since this is widely understood.
  4. The last sentence does not make sense to me. If there is neglible computation time and assimilation of realtime data, why it lacks behind one week? 
  
  - The referee has misread the sentence. We do not say "assimilation of realtime data". We say "predictions are updated daily, lagging approximately one week behind real time"; it takes about a week for the weather and satellite data to become available. Computation time is <5 seconds for the whole domain, once the input data are available.
  ## Manuscript
  
  Line 26: Consider mentioning also the useful integration depth of this measurement technique. 
  
  - We will add text to this effect.
  Line 33: Can you assign the individual citations to each problem separately, instead of lumping them all at the end of the sentence? Thanks!
  
  - I think all problems apply to all, but will double-check and edit as necessary.
  Line 36: replace "are" by "and" (...influenced)
  
  - No, the "and" is on the next line. "are" is correct here.
  Line 295: "there is no clear pattern to it". Please rephrase. The interpretation of the pattern is scientific research. Just because no reason for the variations has been identified so far, it does not mean that there is no reason or no underlying pattern at all.
  
  - We do not say there is "no reason for the variation", we merely say "there is no clear pattern to it", meaning we cannot interpret it in terms of the information available to us.
  Code availability: it is highly recommended to publish the model code, e.g. in a git repository, as it is common standard for other hydrological models.
  
  - We can publish the code as suggested, but the model itself is only a single line of R code. Most of the code is data wrangling to change between formats and data structures for the inputs, so very task-specific and not very interesting, but happy to make public on GitHub. Unfortunately the meteorological data used is not open-access, so we can't provide a live working version, though we can provide the outputs in this way.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2041-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-2041', Anonymous Referee #1, 27 Nov 2023

The authors have done a job on high-resolution soil moisture modeling at the UK scale. The paper is well structured, but a major revision is needed before publication. My main issues include:

1. To add a flowchart that systematically shows the various parts of the study and the roles of the various data.

2. To add a description of the matching of COSMOS sites to model grids. It is not clear at this point how to match COSMOS data at nearly 100m resolution with models at 2km resolution.

3. As the authors said, they used decades of stream flow data. Have these watersheds changed over the last few decades? In particular, are there any hydraulic structures or water extraction projects conducted during this period? How would these decades of river flow data affect the results of this study if they are unsteady?
4. The information presented in Fig.3 is not clear, please revise it. Please add the corresponding rainfall. Please show the soil moisture of one or two months in different seasons.

Citation: https://doi.org/10.5194/egusphere-2023-2041-RC1
- AC1: 'Reply on RC1', Peter E. Levy, 08 Feb 2024
  
  We thank the referee for the time taken. Their comments are shown in italics; our response is beneath in normal font.
  My main issues include:
  1. To add a flowchart that systematically shows the various parts of the study and the roles of the various data.
  
  - A good idea - we will add this in the revision.
  
  2. To add a description of the matching of COSMOS sites to model grids. It is not clear at this point how to match COSMOS data at nearly 100m resolution with models at 2km resolution.
  - This is straightforward because the COSMOS sites are simply matched to the 2-km square they are located in. We can state this in the revision, and discuss other options (e.g. using data from the surrounding grid cells to interpolate to the COSMOS site location).
  
  3. As the authors said, they used decades of stream flow data. Have these watersheds changed over the last few decades? In particular, are there any hydraulic structures or water extraction projects conducted during this period? How would these decades of river flow data affect the results of this study if they are unsteady?
  - Where these do occur, it would indeed make a step change of unknown size in the parameters we are estimating. The NRFA data include meta-data on any known man-made changes of this kind, and we have tried to remove data prior to these changes where they have occurred. However, of the 1200+ catchments, this affects relatively few, most of which have been identified and removed, so we do not think this is a major problem with the analysis. We can add text to this effect in the revision to the manuscript.
  
  4. The information presented in Fig.3 is not clear, please revise it. Please add the corresponding rainfall. Please show the soil moisture of one or two months in different seasons.
  
  - Adding rainfall to the existing figure is straightforward. Showing some contrasting months is also easy, but will require a separate figure. We can do both in the revisions.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2041-AC1
RC2:
'Comment on egusphere-2023-2041', Anonymous Referee #2, 10 Dec 2023
The present manuscript aims at predicting soil moisture for the whole UK using a new hydrological model approach based on statistical considerations and data from discharge gauges, remote-sensing products, and cosmic-ray neutron sites. The authors explain the mathematical background of their model in detail and extensively discuss parts of the used data, their results, and the model limitations. To me the introduction of the mathematical approach reads interesting, but it seems to combine a lot of new concepts and ideas, such as an EMA filter, a slope m, a mixed effects model, complex kriging algorithms, bayesian statistics, etc. It is not clear to me to what extend this is all new or already established. It is also not clear to me how these ideas are backed by previous research. If the approach is completely new, I would relate this manuscript more to a journal for hydrological model development. The key is the invention of an (apparently) new model approach, while the use of the highly advertised COSMOS data here turned out to be just a very minor aspect of the study. Being not a hydrological modeler, I cannot evaluate the choices the authors made on the way, but I feel that comparisons to existing models are widely missing. Once the model development is accepted by the hydrological modeler community, a second paper could integrate new data sets, such as COSMOS-UK, to study its performance. Hence, I'd recommend major revision to better focus on the model development, comparisons to existing models, and to address the remaining concerns.
# Major concerns
As a key motivation for inventing a completely new hydrological model, I am missing an extensive introduction of existing hydrological models, their methods and capabilities to predict spatial SM in the UK, where and why they fail, and what will be done differently in this study to solve these issues. Has nobody before operated a hydrological model in the UK? What is their resolution? Has nobody before integrated discharge data? Or satellite data? Or CRNS data? There is plenty of literature here that needs to be discussed before it becomes clear whether you actually invented a completely new approach or took or ammended parts of existing ones. And whether this choice is adequate compared to the performance of existing models.

The authors present their "simple" model with a number of unclear assumptions (Lines 58-70). E.g., treating soil moisture dynamics as a pulse-decay curve with exponential shape. I have strong doubts that this is a valid assumption for soil hydrological processes, neglecting porosity, capillary forces, van.Genuchten models, vegetation influence, etc. If the authors are really convinced about their assumptions here, the reader would at least expect scientific argumentation of why these assumptions hold, e.g., using insights from existing literature. The whole section hardly names any hydrological paper to strengthen the choice of assumptions, which would be OK for the first hydro model invented in 1950, but not in 2023.

A major challenge when comparing soil moisture from hydrological models and COSMOS data is the vertical soil moisture profile. COSMOS averages soil moisture between 0 and 80 cm, with an exponential weight which is higher for shallower layers and that depends (unfortunatelly) on the soil moisture profile itself. It changes over time. And it is not trivial to what layer of the hydrological model these measurements should be compared to, and how. Many other papers have addressed this challenge already. While in the present paper, I cannot find any hint on how exactly the authors compared observed and predicated soil moisture layer-wise. Please elaborate.

The agreement between observed and predicted soil moisture does not look convincing to me (Fig. 3 and 4). There are obvious biases and unmatched dynamics still visible. Performance metrics like KGE or R² are missing to assess the qualitiy of the prediction. The RMSE alone could miss important differences in dynamics.

I wonder whether the performance of the model has been tested on uncalibrated sites. A usual approach to test spatial extrapolation or regionalization models is to train them on a few sites and test them on other sites. Please add such an analysis such that the reader can assess the reliability of your high-resolution model at sites other than the COSMOS sites.

The major selling point of the new model seems to be computational speed (Line 381). However, there are other hydrological models which are also based on simple principles, physical parameters, and still extremely fast. One of many examples could be the mHM model (Samaniego et al. 2010), proofed to be one of the best hydro models globally. A major difference is that they regionalize the calculation of soil porosity, while your model takes a given map for granted. It would be important to highlight the differences to this and other existing models in terms of methodology, speed, and quality of results.

# Minor concerns
The structure of the introduction is unconventional and confusing. It appears that the introduction has not ended before section 1.1, but the subsequent description of the hydro model used seems also be part of the introduction, too. After that, the aims of the study are outlined two pages later. This is highly confusing and should be changed. Section 1.1., and maybe parts of 1.2, should move to the methods section. Please elaborate on the structure and outline of the study at the end of the introduction. I was not able to identify a clear hypothesis, other than making "the most accurate estimate of mapped soil moisture as possible", which is both vague and nonscientific language.

The introduction seems to be a bit biased, as no issues of the CRNS technique have been addressed, while many issues of remote sensing products are prominently mentioned. Especially since the argumentation focuses towards the unwanted influence of vegetation water and soil properties, it is necessary to indicate that CRNS has very similar issues, as it does not work reliably in highly vegetated, highly prorous, or highly organic soils (Bogena et al. 2013, Rasche et al. 2021, etc).

Section 2.1.1.: A proper and unbiased introduction of the COSMOS technique, which is, as was advertised, key to this study, requires more description of the pros and cons. In that sense, the description is actually incomplete. Neutrons are not only sensitive to soil moisture, but to any hydrogen pool in organic matter, vegetation, snow, etc. This is a highly relevant information to assess the performance and quality of your results. Also the fact that COSMOS data is calibrated on actual soil moisture is very relevant, because neutrons are a relative quantity just as the remote sensing data you critisize. Furthermore, Köhli et al. speaks of 15 to 80 cm of sensing depth, why do you mention max. 30 cm depth here? The answer is the wet soil in UK, which brings us back to the fact that limitations of COSMOS have not been properly explained here. Please elaborate on the quality of the CRNS data and provide related citations.

# Specific comments
## Abstract:
The abstract is not logical or at least unclear. You motivate your study by the fact that remote-sensing data, soil hydrological data and vegetation introduce uncertainty. Then you present a solution which involves a remote-sensing product and soil properties. The reader would expect a brief argumentation why this solution solves the previously mentioned issues while it again makes use of them.

The study was further motivated with the fact that remote sensing data have issues to provide absolute soil moisture. The solution presented, however, seems to be good at explaining variation only, with no mention of absolute SM predictions anymore (at least in the abstract). If you raise an issue in the beginning, the reader would expect a reference to it at the end of the story. 

Please use scientific and more concrete language when describing the models used. A "simple model", as the major outcome of your study, is not an adequate description. Can you name it? Is it a statistical or bucket model? Help the reader to categorize the key model of your study among the many existing model variants in hydrology. Similarly, please name or briefly elaborate on "a process-based model" which you mentioned using as a benchmark.

The last sentence does not make sense to me. If there is neglible computation time and assimilation of realtime data, why it lacks behind one week? 

## Manuscript
Line 26: Consider mentioning also the useful integration depth of this measurement technique. 

Line 33: Can you assign the individual citations to each problem separately, instead of lumping them all at the end of the sentence? Thanks!

Line 36: replace "are" by "and" (...influenced)

Line 295: "there is no clear pattern to it". Please rephrase. The interpretation of the pattern is scientific research. Just because no reason for the variations has been identified so far, it does not mean that there is no reason or no underlying pattern at all.

Code availability: it is highly recommended to publish the model code, e.g. in a git repository, as it is common standard for other hydrological models.
Citation: https://doi.org/10.5194/egusphere-2023-2041-RC2
- AC2: 'Reply on RC2', Peter E. Levy, 08 Feb 2024
  
  We thank the referee for the time taken and attention to detail. Their comments are shown in italics; our response is beneath in normal font.
  I'd recommend major revision to better focus on the model development, comparisons to existing models, and to address the remaining concerns.
  # Major concerns
  
  1. As a key motivation for inventing a completely new hydrological model, I am missing an extensive introduction of existing hydrological models, their methods and capabilities to predict spatial SM in the UK, where and why they fail, and what will be done differently in this study to solve these issues. Has nobody before operated a hydrological model in the UK? What is their resolution? Has nobody before integrated discharge data? Or satellite data? Or CRNS data? There is plenty of literature here that needs to be discussed before it becomes clear whether you actually invented a completely new approach or took or ammended parts of existing ones. And whether this choice is adequate compared to the performance of existing models.
  
  - We accept this point, and can add some text to the introduction on existing soil moisture products.
  2. The authors present their "simple" model with a number of unclear assumptions (Lines 58-70). E.g., treating soil moisture dynamics as a pulse-decay curve with exponential shape. I have strong doubts that this is a valid assumption for soil hydrological processes, neglecting porosity, capillary forces, van.Genuchten models, vegetation influence, etc. If the authors are really convinced about their assumptions here, the reader would at least expect scientific argumentation of why these assumptions hold, e.g., using insights from existing literature. The whole section hardly names any hydrological paper to strengthen the choice of assumptions, which would be OK for the first hydro model invented in 1950, but not in 2023.
  
  - We find this a strange comment. The assumptions are explicit in these lines and in the equations, as well as in the referee's comment itself. We are not "neglecting porosity, capillary forces ..." but demonstrating that they do not need to be represented explicitly: at a given site, the dynamics can be summarised very simply as exponential decay, and thereby linearised via the EMA filter. We cite three hydrological papers which have used the same approach successfully. We could add a section which demonstrates how this follows from first principles, but we thought this would be over-kill. We could add this in supplementary information perhaps.
  3. A major challenge when comparing soil moisture from hydrological models and COSMOS data is the vertical soil moisture profile. COSMOS averages soil moisture between 0 and 80 cm, with an exponential weight which is higher for shallower layers and that depends (unfortunatelly) on the soil moisture profile itself. It changes over time. And it is not trivial to what layer of the hydrological model these measurements should be compared to, and how. Many other papers have addressed this challenge already. While in the present paper, I cannot find any hint on how exactly the authors compared observed and predicated soil moisture layer-wise. Please elaborate.
  
  - We explicitly state that we are modelling the COSMOS observations of soil moisture, which can be interpreted loosely as near-surface soil moisture. At no point do we say that there are any "layers of the hydrological model", and the equations are explicit, so I'm not clear where the confusion arises. As the referee says, the depth that CRNS are sensitive to varies somewhat with soil moisture itself, but are always strongly weighted towards the surface soil moisture. We can make this point explicitly in the revision - that the observations (and thus predictions) are subject to this varying-depth effect, and there is no simple solution to this. One could attempt an inverse modelling scheme to infer a depth profile of soil moisture, but this would be very poorly constrained by the available observations.
  4. The agreement between observed and predicted soil moisture does not look convincing to me (Fig. 3 and 4). There are obvious biases and unmatched dynamics still visible. Performance metrics like KGE or R² are missing to assess the qualitiy of the prediction. The RMSE alone could miss important differences in dynamics.
  
  - r2 for every model variant is listed in Table 1, along with AIC as the more useful measure of comparative goodness-of-fit. Sure, the agreement is not perfect, but the point is that the simple linear model does better than the previous satellite estimates and the more complex models cited.
  5. I wonder whether the performance of the model has been tested on uncalibrated sites. A usual approach to test spatial extrapolation or regionalization models is to train them on a few sites and test them on other sites. Please add such an analysis such that the reader can assess the reliability of your high-resolution model at sites other than the COSMOS sites.
  
  - We are not averse to adding cross-validation in principle, but it doesn't achieve anything additional. The point of the hierarchical approach is that it treats the site-to-site variability explicitly, and estimates the global parameters having accounted for this. So in principle, we can already say how well we expect the model to do at a new site, since we have estimated the variance Ψ.
  
  One real advantage of this approach is that we can propagate this uncertainty that we know will arise at each new site into the predictions. Cross-validation is a more computationally intensive way to quantify that same site-to-site uncertainty, but does not provide an easy means of propagating that uncertainty into predictions. The strength of AIC is that, in theory, it provides a measure of out-of-sample prediction, so indicates which model should give the best prediction at sites outwith the calibration set.
  
  We propose to add some text making the above point to the revision, explaining how this method compares to cross-validation.
  6. The major selling point of the new model seems to be computational speed (Line 381). However, there are other hydrological models which are also based on simple principles, physical parameters, and still extremely fast. One of many examples could be the mHM model (Samaniego et al. 2010), proofed to be one of the best hydro models globally. A major difference is that they regionalize the calculation of soil porosity, while your model takes a given map for granted. It would be important to highlight the differences to this and other existing models in terms of methodology, speed, and quality of results.
  
  - We can add some comparison with other modelling approaches to the introduction and/or discussion. One obvious difference with the MHM is the degree of complexity, since it is a system of ODEs with at least 62 parameters to be estimated, rather than a single linear equation with six parameters (Eqn 4). As an aside, the MHM paper referred to appears to do something similar to the method we describe here, albeit using very different terminology (e.g. "regionalisation").
  # Minor concerns
  
  1. The structure of the introduction is unconventional and confusing. It appears that the introduction has not ended before section 1.1, but the subsequent description of the hydro model used seems also be part of the introduction, too. After that, the aims of the study are outlined two pages later. This is highly confusing and should be changed. Section 1.1., and maybe parts of 1.2, should move to the methods section. Please elaborate on the structure and outline of the study at the end of the introduction. I was not able to identify a clear hypothesis, other than making "the most accurate estimate of mapped soil moisture as possible", which is both vague and nonscientific language.
  
  - By contrast, referee 1 says "the paper is well structured". We explain the problem, then introduce our approach to modelling soil moisture in time (1.1) and in space (1.2), and give explicit aims (1.3). The aims only make sense in terms of the problem we are trying to solve (making accurate maps of soil moisture) and our approach to solving it (integrating disparate data sources in a hierarchical linear model), so inevitably appear later. We are not testing any hypothesis here because we are not doing an experiment. There is nothing "vague and nonscientific" about our stated aims.
  2. The introduction seems to be a bit biased, as no issues of the CRNS technique have been addressed, while many issues of remote sensing products are prominently mentioned. Especially since the argumentation focuses towards the unwanted influence of vegetation water and soil properties, it is necessary to indicate that CRNS has very similar issues, as it does not work reliably in highly vegetated, highly prorous, or highly organic soils (Bogena et al. 2013, Rasche et al. 2021, etc).
  
  - We accept this point. We will add some text to give better balance as the referee suggests.
  3. Section 2.1.1.: A proper and unbiased introduction of the COSMOS technique, which is, as was advertised, key to this study, requires more description of the pros and cons. In that sense, the description is actually incomplete. Neutrons are not only sensitive to soil moisture, but to any hydrogen pool in organic matter, vegetation, snow, etc. This is a highly relevant information to assess the performance and quality of your results. Also the fact that COSMOS data is calibrated on actual soil moisture is very relevant, because neutrons are a relative quantity just as the remote sensing data you critisize. Furthermore, Köhli et al. speaks of 15 to 80 cm of sensing depth, why do you mention max. 30 cm depth here? The answer is the wet soil in UK, which brings us back to the fact that limitations of COSMOS have not been properly explained here. Please elaborate on the quality of the CRNS data and provide related citations.
  
  - Same point as #2 above. We will some text to give better balance as the referee suggests.
  # Specific comments
  
  ## Abstract:
  1. The abstract is not logical or at least unclear. You motivate your study by the fact that remote-sensing data, soil hydrological data and vegetation introduce uncertainty. Then you present a solution which involves a remote-sensing product and soil properties. The reader would expect a brief argumentation why this solution solves the previously mentioned issues while it again makes use of them.
  
  - The point we failed to make was that our method reduces uncertainty by integrating multiple data sources, all of which have weaknesses, but together act as a better constraint on the true soil moisture. We will add text to this effect.
  2. The study was further motivated with the fact that remote sensing data have issues to provide absolute soil moisture. The solution presented, however, seems to be good at explaining variation only, with no mention of absolute SM predictions anymore (at least in the abstract). If you raise an issue in the beginning, the reader would expect a reference to it at the end of the story. 
  
  - We accept this point, will clarify this in the revision.
  3. Please use scientific and more concrete language when describing the models used. A "simple model", as the major outcome of your study, is not an adequate description. Can you name it? Is it a statistical or bucket model? Help the reader to categorize the key model of your study among the many existing model variants in hydrology. Similarly, please name or briefly elaborate on "a process-based model" which you mentioned using as a benchmark.
  
  - We will substitute with "linear model", since this is widely understood.
  4. The last sentence does not make sense to me. If there is neglible computation time and assimilation of realtime data, why it lacks behind one week? 
  
  - The referee has misread the sentence. We do not say "assimilation of realtime data". We say "predictions are updated daily, lagging approximately one week behind real time"; it takes about a week for the weather and satellite data to become available. Computation time is <5 seconds for the whole domain, once the input data are available.
  ## Manuscript
  
  Line 26: Consider mentioning also the useful integration depth of this measurement technique. 
  
  - We will add text to this effect.
  Line 33: Can you assign the individual citations to each problem separately, instead of lumping them all at the end of the sentence? Thanks!
  
  - I think all problems apply to all, but will double-check and edit as necessary.
  Line 36: replace "are" by "and" (...influenced)
  
  - No, the "and" is on the next line. "are" is correct here.
  Line 295: "there is no clear pattern to it". Please rephrase. The interpretation of the pattern is scientific research. Just because no reason for the variations has been identified so far, it does not mean that there is no reason or no underlying pattern at all.
  
  - We do not say there is "no reason for the variation", we merely say "there is no clear pattern to it", meaning we cannot interpret it in terms of the information available to us.
  Code availability: it is highly recommended to publish the model code, e.g. in a git repository, as it is common standard for other hydrological models.
  
  - We can publish the code as suggested, but the model itself is only a single line of R code. Most of the code is data wrangling to change between formats and data structures for the inputs, so very task-specific and not very interesting, but happy to make public on GitHub. Unfortunately the meteorological data used is not open-access, so we can't provide a live working version, though we can provide the outputs in this way.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2041-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

ED: Publish subject to revisions (further review by editor and referees) (13 Feb 2024) by Gerrit H. de Rooij

Dear authors,

The reviews and your replies are such that I believe that a revised version of the paper can be suitable for publication. I therefore request you to provide a revised version of the paper. I made a few notes when I was studying the discussion that I reproduce below, in the hope that they will be of benefit when you revise the paper.

Sincerely yours,

Gerrit de Rooij
Editor

Referee 1

In your reply to the second comment you state that you ‘the COSMOS sites are simply matched to the 2-km square they are located in’., but you do not explain how the matching was performed. Did you simply equate the values for which matching was required?

Referee 2

This referee is the more critical of the two. From the discussion I have the impression that you (the authors) and the referee approach the subject from very different viewpoints. At times this leads to differences of opinion that I consider part of the scientific debate (and therefore not a ground for rejection), and at other times to misunderstandings.

In the former case, the discussion with the referee can be incorporated in the paper by devoting some space in the Introduction to the literature that represents alternative approaches. The referee alludes to this by suggesting to review the literature on hydrological modelling on the relevant scales (main comments 1 and 6). I would like to add that the paper in its current form is somewhat slanted towards the remote sensing aspects and could be more even-handed by devoting attention to the hydrological aspects of the study. This will take some effort but is quite doable, in my assessment. This will help you to better define what the added value of your model is, vis-a-vis the suite of available models. You already do so somewhat tentatively in the paper, and more pointed in your reply to Referee 2. I therefore suspect you very well know what the contribution of your work is, you only need to make sure that the reader knows as well.

The misunderstandings can help you to clarify the paper, especially for those readers who have backgrounds and research interests that are different from yours.

Main comment 2. I am not sure how well you can derive the exponential decay from first principles of soil physics, but I do not think it is necessary – you have several papers to back up the approach.

Main comment 3. In the discussion of this point, the contrasting vantage points of the authors on one hand and the referee on the other are very apparent. I believe you can use the discussion here to clarify the paper for the more hydrologically inclined readers, and also to select additional literature to discuss in the introduction to make the paper more balanced. It may also prove worthwhile to briefly point out the strengths and weaknesses of either viewpoint, which can then result in a line of argument that supports the choice for the type of modelling approach you are advocating in the paper. You already initiated the development of such an argument in your reply to the comment.

Main comment 5. Your reply to this comment is quite interesting. Please work in into the paper in one way or another.

Hide

AR by Peter E. Levy on behalf of the Authors (26 Jul 2024) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (29 Jul 2024) by Gerrit H. de Rooij

RR by Anonymous Referee #1 (29 Aug 2024)

ED: Publish subject to technical corrections (30 Aug 2024) by Gerrit H. de Rooij

AR by Peter E. Levy on behalf of the Authors (10 Sep 2024) Manuscript

Journal article(s) based on this preprint

06 Nov 2024

Mapping soil moisture across the UK: assimilating cosmic-ray neutron sensors, remotely sensed indices, rainfall radar and catchment water balance data in a Bayesian hierarchical model

Peter E. Levy and the COSMOS-UK team

Hydrol. Earth Syst. Sci., 28, 4819–4836, https://doi.org/10.5194/hess-28-4819-2024,https://doi.org/10.5194/hess-28-4819-2024, 2024

Short summary

Peter E. Levy and the COSMOS-UK team

Viewed

Total article views: 767 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
535	195	37	767	33	32

HTML: 535
PDF: 195
XML: 37
Total: 767
BibTeX: 33
EndNote: 32

Views and downloads (calculated since 12 Sep 2023)

Month	HTML	PDF	XML	Total
Sep 2023	139	61	7	207
Oct 2023	82	33	2	117
Nov 2023	30	8	1	39
Dec 2023	31	12	4	47
Jan 2024	22	7	1	30
Feb 2024	41	14	5	60
Mar 2024	19	12	2	33
Apr 2024	12	8	4	24
May 2024	27	15	2	44
Jun 2024	47	11	4	62
Jul 2024	28	4	2	34
Aug 2024	15	2	2	19
Sep 2024	17	3	0	20
Oct 2024	21	5	1	27
Nov 2024	4	0	4

Cumulative views and downloads (calculated since 12 Sep 2023)

Month	HTML	PDF	XML	Total
Sep 2023	139	61	7	207
Oct 2023	82	33	2	117
Nov 2023	30	8	1	39
Dec 2023	31	12	4	47
Jan 2024	22	7	1	30
Feb 2024	41	14	5	60
Mar 2024	19	12	2	33
Apr 2024	12	8	4	24
May 2024	27	15	2	44
Jun 2024	47	11	4	62
Jul 2024	28	4	2	34
Aug 2024	15	2	2	19
Sep 2024	17	3	0	20
Oct 2024	21	5	1	27
Nov 2024	4	0	4

Viewed (geographical distribution)

Total article views: 738 (including HTML, PDF, and XML) Thereof 738 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 06 Nov 2024

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (14030 KB)
Metadata XML

Short summary

Having accurate up-to-date maps of soil moisture is important for many purposes. However, current modelled and remotely-sensed maps are rather coarse and not very accurate. Here, we demonstrate a simple but accurate approach which is closely linked to direct measurements of soil moisture at a network sites across the UK, and to the water balance (precipitation minus drainage and evaporation) measured at a large number of catchments (1212), as well as to remotely-sensed satellite estimates.


Total:	0
HTML:	0
PDF:	0
XML:	0

Mapping soil moisture across the UK: assimilating cosmic-ray neutron sensors, remotely-sensed indices, rainfall radar and catchment water balance data in a Bayesian hierarchical model

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection

Journal article(s) based on this preprint

Viewed

Viewed (geographical distribution)