the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Mapping soil moisture across the UK: assimilating cosmic-ray neutron sensors, remotely-sensed indices, rainfall radar and catchment water balance data in a Bayesian hierarchical model
Abstract. Soil moisture is important in many hydrological and ecological processes. However, data sets which are currently available have issues with accuracy and resolution. To translate remotely-sensed data to an absolute measure of soil moisture requires mapped estimates of soil hydrological properties and estimates of vegetation properties, and this introduces considerable uncertainty. We present an alternative methodology for producing daily maps of soil moisture over the UK at 2-km resolution ("SMUK"). The method is based on a simple empirical model, calibrated with five years of daily data from cosmic-ray neutron sensors at ~40 sites across the country. The model is driven by precipitation, humidity, a remotely-sensed "soil water index" satellite product, and soil porosity. The model explains around 70 % of the variance in the daily observations. The spatial variation in the parameter describing the soil water retention (and thereby the response to precipitation) was estimated using daily water balance data from ~1200 catchments with good coverage across the country. The model parameters were estimated by Bayesian calibration using a Markov chain Monte Carlo method, so as to characterise the posterior uncertainty in the parameters and predictions. We found that the simple model could emulate the behaviour of a more complex process-based model. Given the high resolution of the inputs in time and space, the model can predict the very detailed variation in soil moisture which arises from the sporadic nature of precipitation events, including the small-scale and short-term variations associated with orographic and convective rainfall. Predictions over the period 2016 to 2023 demonstrated realistic patterns following the passage of weather fronts and prolonged droughts. The model has negligible computation time, and inputs and predictions are updated daily, lagging approximately one week behind real time.
- 
        
                                        Notice on discussion status
                                        The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version. 
- 
                                    Preprint
                                    (14030 KB) 
- 
            
            
                                    The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version. 
- Preprint
                                        (14030 KB) 
- Metadata XML
- BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
- 
                     RC1:  'Comment on egusphere-2023-2041', Anonymous Referee #1, 27 Nov 2023
            
            
            
            
                        The authors have done a job on high-resolution soil moisture modeling at the UK scale. The paper is well structured, but a major revision is needed before publication. My main issues include: 
 1. To add a flowchart that systematically shows the various parts of the study and the roles of the various data.
 2. To add a description of the matching of COSMOS sites to model grids. It is not clear at this point how to match COSMOS data at nearly 100m resolution with models at 2km resolution.
 3. As the authors said, they used decades of stream flow data. Have these watersheds changed over the last few decades? In particular, are there any hydraulic structures or water extraction projects conducted during this period? How would these decades of river flow data affect the results of this study if they are unsteady?4. The information presented in Fig.3 is not clear, please revise it. Please add the corresponding rainfall. Please show the soil moisture of one or two months in different seasons. Citation: https://doi.org/10.5194/egusphere-2023-2041-RC1 - 
                                        
                                     AC1:  'Reply on RC1', Peter E. Levy, 08 Feb 2024
                            
                            
                            
                            
                                        We thank the referee for the time taken. Their comments are shown in italics; our response is beneath in normal font. My main issues include: 1. To add a flowchart that systematically shows the various parts of the study and the roles of the various data. 
 - A good idea - we will add this in the revision.
 2. To add a description of the matching of COSMOS sites to model grids. It is not clear at this point how to match COSMOS data at nearly 100m resolution with models at 2km resolution.- This is straightforward because the COSMOS sites are simply matched to the 2-km square they are located in. We can state  this in the revision, and discuss other options (e.g. using data from the surrounding grid cells to interpolate to the COSMOS site location). 
 Â
 3. As the authors said, they used decades of stream flow data. Have these watersheds changed over the last few decades? In particular, are there any hydraulic structures or water extraction projects conducted during this period? How would these decades of river flow data affect the results of this study if they are unsteady?- Where these do occur, it would indeed make a step change of unknown size in the parameters we are estimating. The NRFA data include meta-data on any known man-made changes of this kind, and we have tried to remove data prior to these changes where they have occurred. However, of the 1200+ catchments, this affects relatively few, most of which have been identified and removed, so we do not think this is a major problem with the analysis. We can add text to this effect in the revision to the manuscript. 
 4. The information presented in Fig.3 is not clear, please revise it. Please add the corresponding rainfall. Please show the soil moisture of one or two months in different seasons.
 - Adding rainfall to the existing figure is straightforward. Showing some contrasting months is also easy, but will require a separate figure. We can do both in the revisions.Citation: https://doi.org/10.5194/egusphere-2023-2041-AC1 
 
- 
                                        
                                     AC1:  'Reply on RC1', Peter E. Levy, 08 Feb 2024
                            
                            
                            
                            
                                        
- 
                     RC2:  'Comment on egusphere-2023-2041', Anonymous Referee #2, 10 Dec 2023
            
            
            
            
                        The present manuscript aims at predicting soil moisture for the whole UK using a new hydrological model approach based on statistical considerations and data from discharge gauges, remote-sensing products, and cosmic-ray neutron sites. The authors explain the mathematical background of their model in detail and extensively discuss parts of the used data, their results, and the model limitations. To me the introduction of the mathematical approach reads interesting, but it seems to combine a lot of new concepts and ideas, such as an EMA filter, a slope m, a mixed effects model, complex kriging algorithms, bayesian statistics, etc. It is not clear to me to what extend this is all new or already established. It is also not clear to me how these ideas are backed by previous research. If the approach is completely new, I would relate this manuscript more to a journal for hydrological model development. The key is the invention of an (apparently) new model approach, while the use of the highly advertised COSMOS data here  turned out to be just a very minor aspect of the study. Being not a hydrological modeler, I cannot evaluate the choices the authors made on the way, but I feel that comparisons to existing models are widely missing. Once the model development is accepted by the hydrological modeler community, a second paper could integrate new data sets, such as COSMOS-UK, to study its performance. Hence, I'd recommend major revision to better focus on the model development, comparisons to existing models, and to address the remaining concerns. # Major concerns - As a key motivation for inventing a completely new hydrological model, I am missing an extensive introduction of existing hydrological models, their methods and capabilities to predict spatial SM in the UK, where and why they fail, and what will be done differently in this study to solve these issues. Has nobody before operated a  hydrological model in the UK? What is their resolution? Has nobody before integrated discharge data? Or satellite data? Or CRNS data? There is plenty of literature here that needs to be discussed before it becomes clear whether you actually invented a completely new approach or took or ammended parts of existing ones. And whether this choice is adequate compared to the performance of existing models.
- The authors present their "simple" model with a number of unclear assumptions (Lines 58-70). E.g., treating soil moisture dynamics as a pulse-decay curve with exponential shape. I have strong doubts that this is a valid assumption for soil hydrological processes, neglecting porosity, capillary forces, van.Genuchten models, vegetation influence, etc. If the authors are really convinced about their assumptions here, the reader would at least expect scientific argumentation of why these assumptions hold, e.g., using insights from existing literature. The whole section hardly names any hydrological paper to strengthen the choice of assumptions, which would be OK for the first hydro model invented in 1950, but not in 2023.
- A major challenge when comparing soil moisture from hydrological models and COSMOS data is the vertical soil moisture profile. COSMOS averages soil moisture between 0 and 80 cm, with an exponential weight which is higher for shallower layers and that depends (unfortunatelly) on the soil moisture profile itself. It changes over time. And it is not trivial to what layer of the hydrological model these measurements should be compared to, and how. Many other papers have addressed this challenge already. While in the present paper, I cannot find any hint on how exactly the authors compared observed and predicated soil moisture layer-wise. Please elaborate.
- The agreement between observed and predicted soil moisture does not look convincing to me (Fig. 3 and 4). There are obvious biases and unmatched dynamics still visible. Performance metrics like KGE or R² are missing to assess the qualitiy of the prediction. The RMSE alone could miss important differences in dynamics.
- I wonder whether the performance of the model has been tested on uncalibrated sites. A usual approach to test spatial extrapolation or regionalization models is to train them on a few sites and test them on other sites. Please add such an analysis such that the reader can assess the reliability of your high-resolution model at sites other than the COSMOS sites.
- The major selling point of the new model seems to be computational speed (Line 381). However, there are other hydrological models which are also based on simple principles, physical parameters, and still extremely fast. One of many examples could be the mHM model (Samaniego et al. 2010), proofed to be one of the best hydro models globally. A major difference is that they regionalize the calculation of soil porosity, while your model takes a given map for granted. It would be important to highlight the differences to this and other existing models in terms of methodology, speed, and quality of results.
 # Minor concerns - The structure of the introduction is unconventional and confusing. It appears that the introduction has not ended before section 1.1, but the subsequent description of the hydro model used seems also be part of the introduction, too. After that, the aims of the study are outlined two pages later. This is highly confusing and should be changed. Section 1.1., and maybe parts of 1.2, should move to the methods section. Please elaborate on the structure and outline of the study at the end of the introduction. I was not able to identify a clear hypothesis, other than making "the most accurate estimate of mapped soil moisture as possible", which is both vague and nonscientific language.
- The introduction seems to be a bit biased, as no issues of the CRNS technique have been addressed, while many issues of remote sensing products are prominently mentioned. Especially since the argumentation focuses towards the unwanted influence of vegetation water and soil properties, it is necessary to indicate that CRNS has very similar issues, as it does not work reliably in highly vegetated, highly prorous, or highly organic soils (Bogena et al. 2013, Rasche et al. 2021, etc).
- Section 2.1.1.: A proper and unbiased introduction of the COSMOS technique, which is, as was advertised, key to this study, requires more description of the pros and cons. In that sense, the description is actually incomplete. Neutrons are not only sensitive to soil moisture, but to any hydrogen pool in organic matter, vegetation, snow, etc. This is a highly relevant information to assess the performance and quality of your results. Also the fact that COSMOS data is calibrated on actual soil moisture is very relevant, because neutrons are a relative quantity just as the remote sensing data you critisize. Furthermore, Köhli et al. speaks of 15 to 80 cm of sensing depth, why do you mention max. 30 cm depth here? The answer is the wet soil in UK, which brings us back to the fact that limitations of COSMOS have not been properly explained here. Please elaborate on the quality of the CRNS data and provide related citations.
 # Specific comments ## Abstract: - The abstract is not logical or at least unclear. You motivate your study by the fact that remote-sensing data, soil hydrological data and vegetation introduce uncertainty. Then you present a solution which involves a remote-sensing product and soil properties. The reader would expect a brief argumentation why this solution solves the previously mentioned issues while it again makes use of them.
- The study was further motivated with the fact that remote sensing data have issues to provide absolute soil moisture. The solution presented, however, seems to be good at explaining variation only, with no mention of absolute SM predictions anymore (at least in the abstract). If you raise an issue in the beginning, the reader would expect a reference to it at the end of the story.

- Please use scientific and more concrete language when describing the models used. A "simple model", as the major outcome of your study, is not an adequate description. Can you name it? Is it a statistical or bucket model? Help the reader to categorize the key model of your study among the many existing model variants in hydrology. Similarly, please name or briefly elaborate on "a process-based model" which you mentioned using as a benchmark.
- The last sentence does not make sense to me. If there is neglible computation time and assimilation of realtime data, why it lacks behind one week?

 ## Manuscript - Line 26: Consider mentioning also the useful integration depth of this measurement technique.

- Line 33: Can you assign the individual citations to each problem separately, instead of lumping them all at the end of the sentence? Thanks!
- Line 36: replace "are" by "and" (...influenced)
- Line 295: "there is no clear pattern to it". Please rephrase. The interpretation of the pattern is scientific research. Just because no reason for the variations has been identified so far, it does not mean that there is no reason or no underlying pattern at all.
- Code availability: it is highly recommended to publish the model code, e.g. in a git repository, as it is common standard for other hydrological models.
 Citation: https://doi.org/10.5194/egusphere-2023-2041-RC2 - 
                                        
                                     AC2:  'Reply on RC2', Peter E. Levy, 08 Feb 2024
                            
                            
                            
                            
                                        We thank the referee for the time taken and attention to detail. Their comments are shown in italics; our response is beneath in normal font. I'd recommend major revision to better focus on the model development, comparisons to existing models, and to address the remaining concerns. # Major concerns 
 1. As a key motivation for inventing a completely new hydrological model, I am missing an extensive introduction of existing hydrological models, their methods and capabilities to predict spatial SM in the UK, where and why they fail, and what will be done differently in this study to solve these issues. Has nobody before operated a  hydrological model in the UK? What is their resolution? Has nobody before integrated discharge data? Or satellite data? Or CRNS data? There is plenty of literature here that needs to be discussed before it becomes clear whether you actually invented a completely new approach or took or ammended parts of existing ones. And whether this choice is adequate compared to the performance of existing models.
 - We accept this point, and can add some text to the introduction on existing soil moisture products.2. The authors present their "simple" model with a number of unclear assumptions (Lines 58-70). E.g., treating soil moisture dynamics as a pulse-decay curve with exponential shape. I have strong doubts that this is a valid assumption for soil hydrological processes, neglecting porosity, capillary forces, van.Genuchten models, vegetation influence, etc. If the authors are really convinced about their assumptions here, the reader would at least expect scientific argumentation of why these assumptions hold, e.g., using insights from existing literature. The whole section hardly names any hydrological paper to strengthen the choice of assumptions, which would be OK for the first hydro model invented in 1950, but not in 2023. 
 - We find this a strange comment. The assumptions are explicit in these lines and in the equations, as well as in the referee's comment itself. We are not "neglecting porosity, capillary forces ..." but demonstrating that they do not need to be represented explicitly: at a given site, the dynamics can be summarised very simply as exponential decay, and thereby linearised via the EMA filter. We cite three hydrological papers which have used the same approach successfully. We could add a section which demonstrates how this follows from first principles, but we thought this would be over-kill. We could add this in supplementary information perhaps.3. A major challenge when comparing soil moisture from hydrological models and COSMOS data is the vertical soil moisture profile. COSMOS averages soil moisture between 0 and 80 cm, with an exponential weight which is higher for shallower layers and that depends (unfortunatelly) on the soil moisture profile itself. It changes over time. And it is not trivial to what layer of the hydrological model these measurements should be compared to, and how. Many other papers have addressed this challenge already. While in the present paper, I cannot find any hint on how exactly the authors compared observed and predicated soil moisture layer-wise. Please elaborate. 
 - We explicitly state that we are modelling the COSMOS observations of soil moisture, which can be interpreted loosely as near-surface soil moisture. At no point do we say that there are any "layers of the hydrological model", and the equations are explicit, so I'm not clear where the confusion arises. As the referee says, the depth that CRNS are sensitive to varies somewhat with soil moisture itself, but are always strongly weighted towards the surface soil moisture. We can make this point explicitly in the revision - that the observations (and thus predictions) are subject to this varying-depth effect, and there is no simple solution to this. One could attempt an inverse modelling scheme to infer a depth profile of soil moisture, but this would be very poorly constrained by the available observations.Â4. The agreement between observed and predicted soil moisture does not look convincing to me (Fig. 3 and 4). There are obvious biases and unmatched dynamics still visible. Performance metrics like KGE or R² are missing to assess the qualitiy of the prediction. The RMSE alone could miss important differences in dynamics. 
 - r2 for every model variant is listed in Table 1, along with AIC as the more useful measure of comparative goodness-of-fit. Sure, the agreement is not perfect, but the point is that the simple linear model does better than the previous satellite estimates and the more complex models cited.5. I wonder whether the performance of the model has been tested on uncalibrated sites. A usual approach to test spatial extrapolation or regionalization models is to train them on a few sites and test them on other sites. Please add such an analysis such that the reader can assess the reliability of your high-resolution model at sites other than the COSMOS sites. 
 - We are not averse to adding cross-validation in principle, but it doesn't achieve anything additional. The point of the hierarchical approach is that it treats the site-to-site variability explicitly, and estimates the global parameters having accounted for this. So in principle, we can already say how well we expect the model to do at a new site, since we have estimated the variance Ψ.
 One real advantage of this approach is that we can propagate this uncertainty that we know will arise at each new site into the predictions. Cross-validation is a more computationally intensive way to quantify that same site-to-site uncertainty, but does not provide an easy means of propagating that uncertainty into predictions. The strength of AIC is that, in theory, it provides a measure of out-of-sample prediction, so indicates which model should give the best prediction at sites outwith the calibration set.
 We propose to add some text making the above point to the revision, explaining how this method compares to cross-validation.Â6. The major selling point of the new model seems to be computational speed (Line 381). However, there are other hydrological models which are also based on simple principles, physical parameters, and still extremely fast. One of many examples could be the mHM model (Samaniego et al. 2010), proofed to be one of the best hydro models globally. A major difference is that they regionalize the calculation of soil porosity, while your model takes a given map for granted. It would be important to highlight the differences to this and other existing models in terms of methodology, speed, and quality of results. 
 - We can add some comparison with other modelling approaches to the introduction and/or discussion. One obvious difference with the MHM is the degree of complexity, since it is a system of ODEs with at least 62 parameters to be estimated, rather than a single linear equation with six parameters (Eqn 4). As an aside, the MHM paper referred to appears to do something similar to the method we describe here, albeit using very different terminology (e.g. "regionalisation").# Minor concerns 
 1. The structure of the introduction is unconventional and confusing. It appears that the introduction has not ended before section 1.1, but the subsequent description of the hydro model used seems also be part of the introduction, too. After that, the aims of the study are outlined two pages later. This is highly confusing and should be changed. Section 1.1., and maybe parts of 1.2, should move to the methods section. Please elaborate on the structure and outline of the study at the end of the introduction. I was not able to identify a clear hypothesis, other than making "the most accurate estimate of mapped soil moisture as possible", which is both vague and nonscientific language.
 - By contrast, referee 1 says "the paper is well structured". We explain the problem, then introduce our approach to modelling soil moisture in time (1.1) and in space (1.2), and give explicit aims (1.3). The aims only make sense in terms of the problem we are trying to solve (making accurate maps of soil moisture) and our approach to solving it (integrating disparate data sources in a hierarchical linear model), so inevitably appear later. We are not testing any hypothesis here because we are not doing an experiment. There is nothing "vague and nonscientific" about our stated aims.2. The introduction seems to be a bit biased, as no issues of the CRNS technique have been addressed, while many issues of remote sensing products are prominently mentioned. Especially since the argumentation focuses towards the unwanted influence of vegetation water and soil properties, it is necessary to indicate that CRNS has very similar issues, as it does not work reliably in highly vegetated, highly prorous, or highly organic soils (Bogena et al. 2013, Rasche et al. 2021, etc). 
 - We accept this point. We will add some text to give better balance as the referee suggests.3. Section 2.1.1.: A proper and unbiased introduction of the COSMOS technique, which is, as was advertised, key to this study, requires more description of the pros and cons. In that sense, the description is actually incomplete. Neutrons are not only sensitive to soil moisture, but to any hydrogen pool in organic matter, vegetation, snow, etc. This is a highly relevant information to assess the performance and quality of your results. Also the fact that COSMOS data is calibrated on actual soil moisture is very relevant, because neutrons are a relative quantity just as the remote sensing data you critisize. Furthermore, Köhli et al. speaks of 15 to 80 cm of sensing depth, why do you mention max. 30 cm depth here? The answer is the wet soil in UK, which brings us back to the fact that limitations of COSMOS have not been properly explained here. Please elaborate on the quality of the CRNS data and provide related citations. 
 - Same point as #2 above. We will some text to give better balance as the referee suggests.# Specific comments 
 ## Abstract:1. The abstract is not logical or at least unclear. You motivate your study by the fact that remote-sensing data, soil hydrological data and vegetation introduce uncertainty. Then you present a solution which involves a remote-sensing product and soil properties. The reader would expect a brief argumentation why this solution solves the previously mentioned issues while it again makes use of them. 
 - The point we failed to make was that our method reduces uncertainty by integrating multiple data sources, all of which have weaknesses, but together act as a better constraint on the true soil moisture. We will add text to this effect.2. The study was further motivated with the fact that remote sensing data have issues to provide absolute soil moisture. The solution presented, however, seems to be good at explaining variation only, with no mention of absolute SM predictions anymore (at least in the abstract). If you raise an issue in the beginning, the reader would expect a reference to it at the end of the story.
 
 - We accept this point, will clarify this in the revision.3. Please use scientific and more concrete language when describing the models used. A "simple model", as the major outcome of your study, is not an adequate description. Can you name it? Is it a statistical or bucket model? Help the reader to categorize the key model of your study among the many existing model variants in hydrology. Similarly, please name or briefly elaborate on "a process-based model" which you mentioned using as a benchmark. 
 - We will substitute with "linear model", since this is widely understood.4. The last sentence does not make sense to me. If there is neglible computation time and assimilation of realtime data, why it lacks behind one week?
 
 - The referee has misread the sentence. We do not say "assimilation of realtime data". We say "predictions are updated daily, lagging approximately one week behind real time"; it takes about a week for the weather and satellite data to become available. Computation time is <5 seconds for the whole domain, once the input data are available.## Manuscript 
 Line 26: Consider mentioning also the useful integration depth of this measurement technique.

 - We will add text to this effect.Line 33: Can you assign the individual citations to each problem separately, instead of lumping them all at the end of the sentence? Thanks! 
 - I think all problems apply to all, but will double-check and edit as necessary.Line 36: replace "are" by "and" (...influenced) 
 - No, the "and" is on the next line. Â "are" is correct here.Line 295: "there is no clear pattern to it". Please rephrase. The interpretation of the pattern is scientific research. Just because no reason for the variations has been identified so far, it does not mean that there is no reason or no underlying pattern at all. 
 - We do not say there is "no reason for the variation", we merely say "there is no clear pattern to it", meaning we cannot interpret it in terms of the information available to us.Code availability: it is highly recommended to publish the model code, e.g. in a git repository, as it is common standard for other hydrological models. 
 - We can publish the code as suggested, but the model itself is only a single line of R code. Most of the code is data wrangling to change between formats and data structures for the inputs, so very task-specific and not very interesting, but happy to make public on GitHub. Unfortunately the meteorological data used is not open-access, so we can't provide a live working version, though we can provide the outputs in this way.Citation: https://doi.org/10.5194/egusphere-2023-2041-AC2 
 
Interactive discussion
Status: closed
- 
                     RC1:  'Comment on egusphere-2023-2041', Anonymous Referee #1, 27 Nov 2023
            
            
            
            
                        The authors have done a job on high-resolution soil moisture modeling at the UK scale. The paper is well structured, but a major revision is needed before publication. My main issues include: 
 1. To add a flowchart that systematically shows the various parts of the study and the roles of the various data.
 2. To add a description of the matching of COSMOS sites to model grids. It is not clear at this point how to match COSMOS data at nearly 100m resolution with models at 2km resolution.
 3. As the authors said, they used decades of stream flow data. Have these watersheds changed over the last few decades? In particular, are there any hydraulic structures or water extraction projects conducted during this period? How would these decades of river flow data affect the results of this study if they are unsteady?4. The information presented in Fig.3 is not clear, please revise it. Please add the corresponding rainfall. Please show the soil moisture of one or two months in different seasons. Citation: https://doi.org/10.5194/egusphere-2023-2041-RC1 - 
                                        
                                     AC1:  'Reply on RC1', Peter E. Levy, 08 Feb 2024
                            
                            
                            
                            
                                        We thank the referee for the time taken. Their comments are shown in italics; our response is beneath in normal font. My main issues include: 1. To add a flowchart that systematically shows the various parts of the study and the roles of the various data. 
 - A good idea - we will add this in the revision.
 2. To add a description of the matching of COSMOS sites to model grids. It is not clear at this point how to match COSMOS data at nearly 100m resolution with models at 2km resolution.- This is straightforward because the COSMOS sites are simply matched to the 2-km square they are located in. We can state  this in the revision, and discuss other options (e.g. using data from the surrounding grid cells to interpolate to the COSMOS site location). 
 Â
 3. As the authors said, they used decades of stream flow data. Have these watersheds changed over the last few decades? In particular, are there any hydraulic structures or water extraction projects conducted during this period? How would these decades of river flow data affect the results of this study if they are unsteady?- Where these do occur, it would indeed make a step change of unknown size in the parameters we are estimating. The NRFA data include meta-data on any known man-made changes of this kind, and we have tried to remove data prior to these changes where they have occurred. However, of the 1200+ catchments, this affects relatively few, most of which have been identified and removed, so we do not think this is a major problem with the analysis. We can add text to this effect in the revision to the manuscript. 
 4. The information presented in Fig.3 is not clear, please revise it. Please add the corresponding rainfall. Please show the soil moisture of one or two months in different seasons.
 - Adding rainfall to the existing figure is straightforward. Showing some contrasting months is also easy, but will require a separate figure. We can do both in the revisions.Citation: https://doi.org/10.5194/egusphere-2023-2041-AC1 
 
- 
                                        
                                     AC1:  'Reply on RC1', Peter E. Levy, 08 Feb 2024
                            
                            
                            
                            
                                        
- 
                     RC2:  'Comment on egusphere-2023-2041', Anonymous Referee #2, 10 Dec 2023
            
            
            
            
                        The present manuscript aims at predicting soil moisture for the whole UK using a new hydrological model approach based on statistical considerations and data from discharge gauges, remote-sensing products, and cosmic-ray neutron sites. The authors explain the mathematical background of their model in detail and extensively discuss parts of the used data, their results, and the model limitations. To me the introduction of the mathematical approach reads interesting, but it seems to combine a lot of new concepts and ideas, such as an EMA filter, a slope m, a mixed effects model, complex kriging algorithms, bayesian statistics, etc. It is not clear to me to what extend this is all new or already established. It is also not clear to me how these ideas are backed by previous research. If the approach is completely new, I would relate this manuscript more to a journal for hydrological model development. The key is the invention of an (apparently) new model approach, while the use of the highly advertised COSMOS data here  turned out to be just a very minor aspect of the study. Being not a hydrological modeler, I cannot evaluate the choices the authors made on the way, but I feel that comparisons to existing models are widely missing. Once the model development is accepted by the hydrological modeler community, a second paper could integrate new data sets, such as COSMOS-UK, to study its performance. Hence, I'd recommend major revision to better focus on the model development, comparisons to existing models, and to address the remaining concerns. # Major concerns - As a key motivation for inventing a completely new hydrological model, I am missing an extensive introduction of existing hydrological models, their methods and capabilities to predict spatial SM in the UK, where and why they fail, and what will be done differently in this study to solve these issues. Has nobody before operated a  hydrological model in the UK? What is their resolution? Has nobody before integrated discharge data? Or satellite data? Or CRNS data? There is plenty of literature here that needs to be discussed before it becomes clear whether you actually invented a completely new approach or took or ammended parts of existing ones. And whether this choice is adequate compared to the performance of existing models.
- The authors present their "simple" model with a number of unclear assumptions (Lines 58-70). E.g., treating soil moisture dynamics as a pulse-decay curve with exponential shape. I have strong doubts that this is a valid assumption for soil hydrological processes, neglecting porosity, capillary forces, van.Genuchten models, vegetation influence, etc. If the authors are really convinced about their assumptions here, the reader would at least expect scientific argumentation of why these assumptions hold, e.g., using insights from existing literature. The whole section hardly names any hydrological paper to strengthen the choice of assumptions, which would be OK for the first hydro model invented in 1950, but not in 2023.
- A major challenge when comparing soil moisture from hydrological models and COSMOS data is the vertical soil moisture profile. COSMOS averages soil moisture between 0 and 80 cm, with an exponential weight which is higher for shallower layers and that depends (unfortunatelly) on the soil moisture profile itself. It changes over time. And it is not trivial to what layer of the hydrological model these measurements should be compared to, and how. Many other papers have addressed this challenge already. While in the present paper, I cannot find any hint on how exactly the authors compared observed and predicated soil moisture layer-wise. Please elaborate.
- The agreement between observed and predicted soil moisture does not look convincing to me (Fig. 3 and 4). There are obvious biases and unmatched dynamics still visible. Performance metrics like KGE or R² are missing to assess the qualitiy of the prediction. The RMSE alone could miss important differences in dynamics.
- I wonder whether the performance of the model has been tested on uncalibrated sites. A usual approach to test spatial extrapolation or regionalization models is to train them on a few sites and test them on other sites. Please add such an analysis such that the reader can assess the reliability of your high-resolution model at sites other than the COSMOS sites.
- The major selling point of the new model seems to be computational speed (Line 381). However, there are other hydrological models which are also based on simple principles, physical parameters, and still extremely fast. One of many examples could be the mHM model (Samaniego et al. 2010), proofed to be one of the best hydro models globally. A major difference is that they regionalize the calculation of soil porosity, while your model takes a given map for granted. It would be important to highlight the differences to this and other existing models in terms of methodology, speed, and quality of results.
 # Minor concerns - The structure of the introduction is unconventional and confusing. It appears that the introduction has not ended before section 1.1, but the subsequent description of the hydro model used seems also be part of the introduction, too. After that, the aims of the study are outlined two pages later. This is highly confusing and should be changed. Section 1.1., and maybe parts of 1.2, should move to the methods section. Please elaborate on the structure and outline of the study at the end of the introduction. I was not able to identify a clear hypothesis, other than making "the most accurate estimate of mapped soil moisture as possible", which is both vague and nonscientific language.
- The introduction seems to be a bit biased, as no issues of the CRNS technique have been addressed, while many issues of remote sensing products are prominently mentioned. Especially since the argumentation focuses towards the unwanted influence of vegetation water and soil properties, it is necessary to indicate that CRNS has very similar issues, as it does not work reliably in highly vegetated, highly prorous, or highly organic soils (Bogena et al. 2013, Rasche et al. 2021, etc).
- Section 2.1.1.: A proper and unbiased introduction of the COSMOS technique, which is, as was advertised, key to this study, requires more description of the pros and cons. In that sense, the description is actually incomplete. Neutrons are not only sensitive to soil moisture, but to any hydrogen pool in organic matter, vegetation, snow, etc. This is a highly relevant information to assess the performance and quality of your results. Also the fact that COSMOS data is calibrated on actual soil moisture is very relevant, because neutrons are a relative quantity just as the remote sensing data you critisize. Furthermore, Köhli et al. speaks of 15 to 80 cm of sensing depth, why do you mention max. 30 cm depth here? The answer is the wet soil in UK, which brings us back to the fact that limitations of COSMOS have not been properly explained here. Please elaborate on the quality of the CRNS data and provide related citations.
 # Specific comments ## Abstract: - The abstract is not logical or at least unclear. You motivate your study by the fact that remote-sensing data, soil hydrological data and vegetation introduce uncertainty. Then you present a solution which involves a remote-sensing product and soil properties. The reader would expect a brief argumentation why this solution solves the previously mentioned issues while it again makes use of them.
- The study was further motivated with the fact that remote sensing data have issues to provide absolute soil moisture. The solution presented, however, seems to be good at explaining variation only, with no mention of absolute SM predictions anymore (at least in the abstract). If you raise an issue in the beginning, the reader would expect a reference to it at the end of the story.

- Please use scientific and more concrete language when describing the models used. A "simple model", as the major outcome of your study, is not an adequate description. Can you name it? Is it a statistical or bucket model? Help the reader to categorize the key model of your study among the many existing model variants in hydrology. Similarly, please name or briefly elaborate on "a process-based model" which you mentioned using as a benchmark.
- The last sentence does not make sense to me. If there is neglible computation time and assimilation of realtime data, why it lacks behind one week?

 ## Manuscript - Line 26: Consider mentioning also the useful integration depth of this measurement technique.

- Line 33: Can you assign the individual citations to each problem separately, instead of lumping them all at the end of the sentence? Thanks!
- Line 36: replace "are" by "and" (...influenced)
- Line 295: "there is no clear pattern to it". Please rephrase. The interpretation of the pattern is scientific research. Just because no reason for the variations has been identified so far, it does not mean that there is no reason or no underlying pattern at all.
- Code availability: it is highly recommended to publish the model code, e.g. in a git repository, as it is common standard for other hydrological models.
 Citation: https://doi.org/10.5194/egusphere-2023-2041-RC2 - 
                                        
                                     AC2:  'Reply on RC2', Peter E. Levy, 08 Feb 2024
                            
                            
                            
                            
                                        We thank the referee for the time taken and attention to detail. Their comments are shown in italics; our response is beneath in normal font. I'd recommend major revision to better focus on the model development, comparisons to existing models, and to address the remaining concerns. # Major concerns 
 1. As a key motivation for inventing a completely new hydrological model, I am missing an extensive introduction of existing hydrological models, their methods and capabilities to predict spatial SM in the UK, where and why they fail, and what will be done differently in this study to solve these issues. Has nobody before operated a  hydrological model in the UK? What is their resolution? Has nobody before integrated discharge data? Or satellite data? Or CRNS data? There is plenty of literature here that needs to be discussed before it becomes clear whether you actually invented a completely new approach or took or ammended parts of existing ones. And whether this choice is adequate compared to the performance of existing models.
 - We accept this point, and can add some text to the introduction on existing soil moisture products.2. The authors present their "simple" model with a number of unclear assumptions (Lines 58-70). E.g., treating soil moisture dynamics as a pulse-decay curve with exponential shape. I have strong doubts that this is a valid assumption for soil hydrological processes, neglecting porosity, capillary forces, van.Genuchten models, vegetation influence, etc. If the authors are really convinced about their assumptions here, the reader would at least expect scientific argumentation of why these assumptions hold, e.g., using insights from existing literature. The whole section hardly names any hydrological paper to strengthen the choice of assumptions, which would be OK for the first hydro model invented in 1950, but not in 2023. 
 - We find this a strange comment. The assumptions are explicit in these lines and in the equations, as well as in the referee's comment itself. We are not "neglecting porosity, capillary forces ..." but demonstrating that they do not need to be represented explicitly: at a given site, the dynamics can be summarised very simply as exponential decay, and thereby linearised via the EMA filter. We cite three hydrological papers which have used the same approach successfully. We could add a section which demonstrates how this follows from first principles, but we thought this would be over-kill. We could add this in supplementary information perhaps.3. A major challenge when comparing soil moisture from hydrological models and COSMOS data is the vertical soil moisture profile. COSMOS averages soil moisture between 0 and 80 cm, with an exponential weight which is higher for shallower layers and that depends (unfortunatelly) on the soil moisture profile itself. It changes over time. And it is not trivial to what layer of the hydrological model these measurements should be compared to, and how. Many other papers have addressed this challenge already. While in the present paper, I cannot find any hint on how exactly the authors compared observed and predicated soil moisture layer-wise. Please elaborate. 
 - We explicitly state that we are modelling the COSMOS observations of soil moisture, which can be interpreted loosely as near-surface soil moisture. At no point do we say that there are any "layers of the hydrological model", and the equations are explicit, so I'm not clear where the confusion arises. As the referee says, the depth that CRNS are sensitive to varies somewhat with soil moisture itself, but are always strongly weighted towards the surface soil moisture. We can make this point explicitly in the revision - that the observations (and thus predictions) are subject to this varying-depth effect, and there is no simple solution to this. One could attempt an inverse modelling scheme to infer a depth profile of soil moisture, but this would be very poorly constrained by the available observations.Â4. The agreement between observed and predicted soil moisture does not look convincing to me (Fig. 3 and 4). There are obvious biases and unmatched dynamics still visible. Performance metrics like KGE or R² are missing to assess the qualitiy of the prediction. The RMSE alone could miss important differences in dynamics. 
 - r2 for every model variant is listed in Table 1, along with AIC as the more useful measure of comparative goodness-of-fit. Sure, the agreement is not perfect, but the point is that the simple linear model does better than the previous satellite estimates and the more complex models cited.5. I wonder whether the performance of the model has been tested on uncalibrated sites. A usual approach to test spatial extrapolation or regionalization models is to train them on a few sites and test them on other sites. Please add such an analysis such that the reader can assess the reliability of your high-resolution model at sites other than the COSMOS sites. 
 - We are not averse to adding cross-validation in principle, but it doesn't achieve anything additional. The point of the hierarchical approach is that it treats the site-to-site variability explicitly, and estimates the global parameters having accounted for this. So in principle, we can already say how well we expect the model to do at a new site, since we have estimated the variance Ψ.
 One real advantage of this approach is that we can propagate this uncertainty that we know will arise at each new site into the predictions. Cross-validation is a more computationally intensive way to quantify that same site-to-site uncertainty, but does not provide an easy means of propagating that uncertainty into predictions. The strength of AIC is that, in theory, it provides a measure of out-of-sample prediction, so indicates which model should give the best prediction at sites outwith the calibration set.
 We propose to add some text making the above point to the revision, explaining how this method compares to cross-validation.Â6. The major selling point of the new model seems to be computational speed (Line 381). However, there are other hydrological models which are also based on simple principles, physical parameters, and still extremely fast. One of many examples could be the mHM model (Samaniego et al. 2010), proofed to be one of the best hydro models globally. A major difference is that they regionalize the calculation of soil porosity, while your model takes a given map for granted. It would be important to highlight the differences to this and other existing models in terms of methodology, speed, and quality of results. 
 - We can add some comparison with other modelling approaches to the introduction and/or discussion. One obvious difference with the MHM is the degree of complexity, since it is a system of ODEs with at least 62 parameters to be estimated, rather than a single linear equation with six parameters (Eqn 4). As an aside, the MHM paper referred to appears to do something similar to the method we describe here, albeit using very different terminology (e.g. "regionalisation").# Minor concerns 
 1. The structure of the introduction is unconventional and confusing. It appears that the introduction has not ended before section 1.1, but the subsequent description of the hydro model used seems also be part of the introduction, too. After that, the aims of the study are outlined two pages later. This is highly confusing and should be changed. Section 1.1., and maybe parts of 1.2, should move to the methods section. Please elaborate on the structure and outline of the study at the end of the introduction. I was not able to identify a clear hypothesis, other than making "the most accurate estimate of mapped soil moisture as possible", which is both vague and nonscientific language.
 - By contrast, referee 1 says "the paper is well structured". We explain the problem, then introduce our approach to modelling soil moisture in time (1.1) and in space (1.2), and give explicit aims (1.3). The aims only make sense in terms of the problem we are trying to solve (making accurate maps of soil moisture) and our approach to solving it (integrating disparate data sources in a hierarchical linear model), so inevitably appear later. We are not testing any hypothesis here because we are not doing an experiment. There is nothing "vague and nonscientific" about our stated aims.2. The introduction seems to be a bit biased, as no issues of the CRNS technique have been addressed, while many issues of remote sensing products are prominently mentioned. Especially since the argumentation focuses towards the unwanted influence of vegetation water and soil properties, it is necessary to indicate that CRNS has very similar issues, as it does not work reliably in highly vegetated, highly prorous, or highly organic soils (Bogena et al. 2013, Rasche et al. 2021, etc). 
 - We accept this point. We will add some text to give better balance as the referee suggests.3. Section 2.1.1.: A proper and unbiased introduction of the COSMOS technique, which is, as was advertised, key to this study, requires more description of the pros and cons. In that sense, the description is actually incomplete. Neutrons are not only sensitive to soil moisture, but to any hydrogen pool in organic matter, vegetation, snow, etc. This is a highly relevant information to assess the performance and quality of your results. Also the fact that COSMOS data is calibrated on actual soil moisture is very relevant, because neutrons are a relative quantity just as the remote sensing data you critisize. Furthermore, Köhli et al. speaks of 15 to 80 cm of sensing depth, why do you mention max. 30 cm depth here? The answer is the wet soil in UK, which brings us back to the fact that limitations of COSMOS have not been properly explained here. Please elaborate on the quality of the CRNS data and provide related citations. 
 - Same point as #2 above. We will some text to give better balance as the referee suggests.# Specific comments 
 ## Abstract:1. The abstract is not logical or at least unclear. You motivate your study by the fact that remote-sensing data, soil hydrological data and vegetation introduce uncertainty. Then you present a solution which involves a remote-sensing product and soil properties. The reader would expect a brief argumentation why this solution solves the previously mentioned issues while it again makes use of them. 
 - The point we failed to make was that our method reduces uncertainty by integrating multiple data sources, all of which have weaknesses, but together act as a better constraint on the true soil moisture. We will add text to this effect.2. The study was further motivated with the fact that remote sensing data have issues to provide absolute soil moisture. The solution presented, however, seems to be good at explaining variation only, with no mention of absolute SM predictions anymore (at least in the abstract). If you raise an issue in the beginning, the reader would expect a reference to it at the end of the story.
 
 - We accept this point, will clarify this in the revision.3. Please use scientific and more concrete language when describing the models used. A "simple model", as the major outcome of your study, is not an adequate description. Can you name it? Is it a statistical or bucket model? Help the reader to categorize the key model of your study among the many existing model variants in hydrology. Similarly, please name or briefly elaborate on "a process-based model" which you mentioned using as a benchmark. 
 - We will substitute with "linear model", since this is widely understood.4. The last sentence does not make sense to me. If there is neglible computation time and assimilation of realtime data, why it lacks behind one week?
 
 - The referee has misread the sentence. We do not say "assimilation of realtime data". We say "predictions are updated daily, lagging approximately one week behind real time"; it takes about a week for the weather and satellite data to become available. Computation time is <5 seconds for the whole domain, once the input data are available.## Manuscript 
 Line 26: Consider mentioning also the useful integration depth of this measurement technique.

 - We will add text to this effect.Line 33: Can you assign the individual citations to each problem separately, instead of lumping them all at the end of the sentence? Thanks! 
 - I think all problems apply to all, but will double-check and edit as necessary.Line 36: replace "are" by "and" (...influenced) 
 - No, the "and" is on the next line. Â "are" is correct here.Line 295: "there is no clear pattern to it". Please rephrase. The interpretation of the pattern is scientific research. Just because no reason for the variations has been identified so far, it does not mean that there is no reason or no underlying pattern at all. 
 - We do not say there is "no reason for the variation", we merely say "there is no clear pattern to it", meaning we cannot interpret it in terms of the information available to us.Code availability: it is highly recommended to publish the model code, e.g. in a git repository, as it is common standard for other hydrological models. 
 - We can publish the code as suggested, but the model itself is only a single line of R code. Most of the code is data wrangling to change between formats and data structures for the inputs, so very task-specific and not very interesting, but happy to make public on GitHub. Unfortunately the meteorological data used is not open-access, so we can't provide a live working version, though we can provide the outputs in this way.Citation: https://doi.org/10.5194/egusphere-2023-2041-AC2 
 
Peer review completion
 
                             
                           
                                 
                
                                 
                             
                          Journal article(s) based on this preprint
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 535 | 195 | 37 | 767 | 33 | 32 | 
- HTML: 535
- PDF: 195
- XML: 37
- Total: 767
- BibTeX: 33
- EndNote: 32
Viewed (geographical distribution)
| Country | # | Views | % | 
|---|
| Total: | 0 | 
| HTML: | 0 | 
| PDF: | 0 | 
| XML: | 0 | 
- 1
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
                            (14030 KB) 
- Metadata XML
 
 
                         
                         
                         
                         
                         
            
                             
                 
                 
                 
                