the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A clustering approach to reduce computational expense in land surface models: a case study using JULES vn5.9
Abstract. Land surface models such as JULES (the Joint UK Land Environment Simulator) are usually run on a regular, rectilinear grid, resulting in gridded outputs for variables such as soil moisture and water fluxes. Here we investigate a method of clustering grid cells with similar characteristics together in JULES. Clustering grid cells has the potential to reduce computational expense as well as providing an alternative to tiling approaches for capturing sub-grid heterogeneity. In this study, we cluster grid cells exclusively in the land surface part of modelling, i.e., separate from river routing. We compare gridded and clustered soil moisture outputs from JULES with measurements from the UK Centre for Ecology and Hydrology (UKCEH) COSMOS-UK network and show that the clustering approach can model soil moisture well while reducing computational expense. However, soil moisture results are dependent on the characteristics used to create the clusters. We investigate the effect of using clusters on predicted river flows, and compare routed JULES outputs with NRFA gauge data in the catchment. We show that less expensive JULES clustered outputs give similar river flow results to standard gridded outputs when routed at the grid resolution, and are able to match observed river flow better than gridded outputs when routed at higher resolution.
- Preprint
(2156 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on egusphere-2023-1596', Anonymous Referee #1, 12 Sep 2023
This paper presents a case study where the JULES land model is run with a vector-based spatial configuration, instead of the more typical grid-based setup. The main benefit of such an approach is that it can lead to computational efficiency, if grid cells with similar hydroclimatic behaviour can be grouped into a single computational unit. As the authors already acknowledge, this approach is not new (in addition to the already existing mention of Swenson et al., 2019, see also for example Gharari et al., 2020). The main novelties in this paper are application to the JULES model, as well as a comparison with observations of three soil moisture observation stations. I believe this is within scope of EGU journals, though HESS may be more appropriate than GMD because the tools the authors use, as well as the general concept, already exist.
I do think that the paper needs to be clarified in multiple places (see comments in the uploaded .pdf). Briefly, the paper currently assumes more background knowledge from the reader (both on terminology, as well as on the JULES model) than I think is appropriate. Additionally, methods need to be clarified in multiple places, with a particular focus on explaining to the reader why various choices are appropriate. I particularly want to highlight the choice to run JULES at a daily time step. This seems an uncommon choice for land surface models, and it is unclear to me to what extent results derived from a daily-step model configuration have practical relevance for JULES applications at sub-daily time steps. Given that these methodological choices underpin the analysis, I believe another round of reviews after these clarifications are made may be appropriate.
I further think that the presented analysis on clustering outcomes and similarity between vector-based and gridded setups (section 3.1.1) would benefit from extra analysis. Currently the reader is only presented with two sets of plots for a single time step (out of a 3-year run), and only the most minimal statistics for (I believe) a spatial comparison only. More in-depth analysis is needed to support the statements that:
- Land use and soil type are the most important covariants,
- 1000 vector-based clusters produce sufficiently similar results to a more high-resolution gridded setupFinally, the main conclusion of the paper is that similar JULES performance can be obtained by using a clustering-based spatial discretization scheme as one can get with a traditional gridded setup. First, the paper focus heavily on aggregated efficiency metrics (KGE, NSE, MAE, etc) to compare the gridded and cluster-based setups. While these aggregated scores are indeed similar between both setups, the time series plot in the paper make it very clear that these similar scores are obtained as the result of very different internal model dynamics. I believe the conclusions need to be more nuanced to reflect this. Second and related, I believe that more investigation of these internal dynamics would strengthen the paper. This can involve more detailed analysis of model states and fluxes, as well as comparison to additional external data sources (such as ET) to determine if either of these two setups (gridded or cluster-based) is closer to reality - this would add a new dimension to the paper, in the sense that we would then better understand whether reducing the computational demand of running JULES also comes with a trade-off in model realism. Third (and this is more of a suggestion), the application domain of this test case seems somewhat small to me in both time and space. I believe the paper would be strengthened if the model domain would be larger and/or the simulation times were longer.
Please see the uploaded pdf for further comments.
-
RC2: 'Comment on egusphere-2023-1596', Chen Zhang, 08 Oct 2023
This paper presents a case study of the JULES land model while adapting the clustering approach to update the traditional grid approach. This effort is shown to improve the computational efficiency of the JULES model on the premise that the accuracy is similar to the two approaches. However, the HydroBlocks model and the cluster approach were clearly presented in Chancy et al. (2016) as the authors cited, the novelty of the JULES vn5.9 is supposed to be more stressed in Methods, or this work is more like an application of the HydroBlocks tool, which is a significant effort but doesn’t fall in the GMD’s Development and technical paper scope very well. I also think that the manuscript needs to be more structured to present a logical and complete work.
Below is a list of my detailed comments:
1. Detailed information on each set of simulations of this study is suggested (e.g. in section 3.1.1), which might include the value of the covariates, the variable N, and the grid resolution. For this, a table might be useful. The comparisons under different conditions will be clearer and more reliable, or the reader will be confused.
2. The authors mentioned “This indicates that a ten fold reduction in JULES compute expense can yield comparable results to the 1km gridded approach.” (L119-120) and “… the JULES regridded approach gives similar overall KGE results to the original gridded approach, while still benefiting from a ten fold reduction in compute resource.” (L204-205). What does the compute resource refer to in L205? Is the computational efficiency in this study evaluated by the data recourse, the grid and model setup recourse, or the running time of the model? This is the key point of this study and thus should be clarified or discussed. Then the ten fold computational efficiency improvement is more easily understood.
3. It is suggested to clarify the distribution characteristics of the soil moisture data because the KGE metric was built up based on the normal distribution.
4. The locations of the 26 gauges the authors mentioned in L168 are suggested to be mapped.
5. I do think that Introduction is more focused on the technique while lacking a view of the further influence of the tool and the meaning of the work to regulators and decision makers. The sentence in L52-54 seems like the conclusion of this study and is not recommended in this part. Moreover, the novelty of the JULES vn5.9 should be stressed in Methods.
6. The format of units should be uniform (italic or not). The units of the y-axis in most figures should be presented. The prat of the title in the Fig. 6 should be checked. The set 100LRU (L120) is not mentioned and there might be a writing mistake.Citation: https://doi.org/10.5194/egusphere-2023-1596-RC2 - AC1: 'Comment on egusphere-2023-1596', Elizabeth Cooper, 22 Nov 2023
Status: closed
-
RC1: 'Comment on egusphere-2023-1596', Anonymous Referee #1, 12 Sep 2023
This paper presents a case study where the JULES land model is run with a vector-based spatial configuration, instead of the more typical grid-based setup. The main benefit of such an approach is that it can lead to computational efficiency, if grid cells with similar hydroclimatic behaviour can be grouped into a single computational unit. As the authors already acknowledge, this approach is not new (in addition to the already existing mention of Swenson et al., 2019, see also for example Gharari et al., 2020). The main novelties in this paper are application to the JULES model, as well as a comparison with observations of three soil moisture observation stations. I believe this is within scope of EGU journals, though HESS may be more appropriate than GMD because the tools the authors use, as well as the general concept, already exist.
I do think that the paper needs to be clarified in multiple places (see comments in the uploaded .pdf). Briefly, the paper currently assumes more background knowledge from the reader (both on terminology, as well as on the JULES model) than I think is appropriate. Additionally, methods need to be clarified in multiple places, with a particular focus on explaining to the reader why various choices are appropriate. I particularly want to highlight the choice to run JULES at a daily time step. This seems an uncommon choice for land surface models, and it is unclear to me to what extent results derived from a daily-step model configuration have practical relevance for JULES applications at sub-daily time steps. Given that these methodological choices underpin the analysis, I believe another round of reviews after these clarifications are made may be appropriate.
I further think that the presented analysis on clustering outcomes and similarity between vector-based and gridded setups (section 3.1.1) would benefit from extra analysis. Currently the reader is only presented with two sets of plots for a single time step (out of a 3-year run), and only the most minimal statistics for (I believe) a spatial comparison only. More in-depth analysis is needed to support the statements that:
- Land use and soil type are the most important covariants,
- 1000 vector-based clusters produce sufficiently similar results to a more high-resolution gridded setupFinally, the main conclusion of the paper is that similar JULES performance can be obtained by using a clustering-based spatial discretization scheme as one can get with a traditional gridded setup. First, the paper focus heavily on aggregated efficiency metrics (KGE, NSE, MAE, etc) to compare the gridded and cluster-based setups. While these aggregated scores are indeed similar between both setups, the time series plot in the paper make it very clear that these similar scores are obtained as the result of very different internal model dynamics. I believe the conclusions need to be more nuanced to reflect this. Second and related, I believe that more investigation of these internal dynamics would strengthen the paper. This can involve more detailed analysis of model states and fluxes, as well as comparison to additional external data sources (such as ET) to determine if either of these two setups (gridded or cluster-based) is closer to reality - this would add a new dimension to the paper, in the sense that we would then better understand whether reducing the computational demand of running JULES also comes with a trade-off in model realism. Third (and this is more of a suggestion), the application domain of this test case seems somewhat small to me in both time and space. I believe the paper would be strengthened if the model domain would be larger and/or the simulation times were longer.
Please see the uploaded pdf for further comments.
-
RC2: 'Comment on egusphere-2023-1596', Chen Zhang, 08 Oct 2023
This paper presents a case study of the JULES land model while adapting the clustering approach to update the traditional grid approach. This effort is shown to improve the computational efficiency of the JULES model on the premise that the accuracy is similar to the two approaches. However, the HydroBlocks model and the cluster approach were clearly presented in Chancy et al. (2016) as the authors cited, the novelty of the JULES vn5.9 is supposed to be more stressed in Methods, or this work is more like an application of the HydroBlocks tool, which is a significant effort but doesn’t fall in the GMD’s Development and technical paper scope very well. I also think that the manuscript needs to be more structured to present a logical and complete work.
Below is a list of my detailed comments:
1. Detailed information on each set of simulations of this study is suggested (e.g. in section 3.1.1), which might include the value of the covariates, the variable N, and the grid resolution. For this, a table might be useful. The comparisons under different conditions will be clearer and more reliable, or the reader will be confused.
2. The authors mentioned “This indicates that a ten fold reduction in JULES compute expense can yield comparable results to the 1km gridded approach.” (L119-120) and “… the JULES regridded approach gives similar overall KGE results to the original gridded approach, while still benefiting from a ten fold reduction in compute resource.” (L204-205). What does the compute resource refer to in L205? Is the computational efficiency in this study evaluated by the data recourse, the grid and model setup recourse, or the running time of the model? This is the key point of this study and thus should be clarified or discussed. Then the ten fold computational efficiency improvement is more easily understood.
3. It is suggested to clarify the distribution characteristics of the soil moisture data because the KGE metric was built up based on the normal distribution.
4. The locations of the 26 gauges the authors mentioned in L168 are suggested to be mapped.
5. I do think that Introduction is more focused on the technique while lacking a view of the further influence of the tool and the meaning of the work to regulators and decision makers. The sentence in L52-54 seems like the conclusion of this study and is not recommended in this part. Moreover, the novelty of the JULES vn5.9 should be stressed in Methods.
6. The format of units should be uniform (italic or not). The units of the y-axis in most figures should be presented. The prat of the title in the Fig. 6 should be checked. The set 100LRU (L120) is not mentioned and there might be a writing mistake.Citation: https://doi.org/10.5194/egusphere-2023-1596-RC2 - AC1: 'Comment on egusphere-2023-1596', Elizabeth Cooper, 22 Nov 2023
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
331 | 107 | 40 | 478 | 36 | 34 |
- HTML: 331
- PDF: 107
- XML: 40
- Total: 478
- BibTeX: 36
- EndNote: 34
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1