A clustering approach to reduce computational expense in land surface models: a case study using JULES vn5.9

Cooper, Elizabeth; Ellis, Rich; Blyth, Eleanor; Dadson, Simon

doi:https://doi.org/10.5194/egusphere-2023-1596

Preprints

https://doi.org/10.5194/egusphere-2023-1596

Preprints

10 Aug 2023

| 10 Aug 2023

A clustering approach to reduce computational expense in land surface models: a case study using JULES vn5.9

Elizabeth Cooper, Rich Ellis, Eleanor Blyth, and Simon Dadson

Abstract. Land surface models such as JULES (the Joint UK Land Environment Simulator) are usually run on a regular, rectilinear grid, resulting in gridded outputs for variables such as soil moisture and water fluxes. Here we investigate a method of clustering grid cells with similar characteristics together in JULES. Clustering grid cells has the potential to reduce computational expense as well as providing an alternative to tiling approaches for capturing sub-grid heterogeneity. In this study, we cluster grid cells exclusively in the land surface part of modelling, i.e., separate from river routing. We compare gridded and clustered soil moisture outputs from JULES with measurements from the UK Centre for Ecology and Hydrology (UKCEH) COSMOS-UK network and show that the clustering approach can model soil moisture well while reducing computational expense. However, soil moisture results are dependent on the characteristics used to create the clusters. We investigate the effect of using clusters on predicted river flows, and compare routed JULES outputs with NRFA gauge data in the catchment. We show that less expensive JULES clustered outputs give similar river flow results to standard gridded outputs when routed at the grid resolution, and are able to match observed river flow better than gridded outputs when routed at higher resolution.

Received: 12 Jul 2023 – Discussion started: 10 Aug 2023

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Elizabeth Cooper, Rich Ellis, Eleanor Blyth, and Simon Dadson

Status: closed

RC1: 'Comment on egusphere-2023-1596', Anonymous Referee #1, 12 Sep 2023

This paper presents a case study where the JULES land model is run with a vector-based spatial configuration, instead of the more typical grid-based setup. The main benefit of such an approach is that it can lead to computational efficiency, if grid cells with similar hydroclimatic behaviour can be grouped into a single computational unit. As the authors already acknowledge, this approach is not new (in addition to the already existing mention of Swenson et al., 2019, see also for example Gharari et al., 2020). The main novelties in this paper are application to the JULES model, as well as a comparison with observations of three soil moisture observation stations. I believe this is within scope of EGU journals, though HESS may be more appropriate than GMD because the tools the authors use, as well as the general concept, already exist.
I do think that the paper needs to be clarified in multiple places (see comments in the uploaded .pdf). Briefly, the paper currently assumes more background knowledge from the reader (both on terminology, as well as on the JULES model) than I think is appropriate. Additionally, methods need to be clarified in multiple places, with a particular focus on explaining to the reader why various choices are appropriate. I particularly want to highlight the choice to run JULES at a daily time step. This seems an uncommon choice for land surface models, and it is unclear to me to what extent results derived from a daily-step model configuration have practical relevance for JULES applications at sub-daily time steps. Given that these methodological choices underpin the analysis, I believe another round of reviews after these clarifications are made may be appropriate.
I further think that the presented analysis on clustering outcomes and similarity between vector-based and gridded setups (section 3.1.1) would benefit from extra analysis. Currently the reader is only presented with two sets of plots for a single time step (out of a 3-year run), and only the most minimal statistics for (I believe) a spatial comparison only. More in-depth analysis is needed to support the statements that:

- Land use and soil type are the most important covariants,

- 1000 vector-based clusters produce sufficiently similar results to a more high-resolution gridded setup
Finally, the main conclusion of the paper is that similar JULES performance can be obtained by using a clustering-based spatial discretization scheme as one can get with a traditional gridded setup. First, the paper focus heavily on aggregated efficiency metrics (KGE, NSE, MAE, etc) to compare the gridded and cluster-based setups. While these aggregated scores are indeed similar between both setups, the time series plot in the paper make it very clear that these similar scores are obtained as the result of very different internal model dynamics. I believe the conclusions need to be more nuanced to reflect this. Second and related, I believe that more investigation of these internal dynamics would strengthen the paper. This can involve more detailed analysis of model states and fluxes, as well as comparison to additional external data sources (such as ET) to determine if either of these two setups (gridded or cluster-based) is closer to reality - this would add a new dimension to the paper, in the sense that we would then better understand whether reducing the computational demand of running JULES also comes with a trade-off in model realism. Third (and this is more of a suggestion), the application domain of this test case seems somewhat small to me in both time and space. I believe the paper would be strengthened if the model domain would be larger and/or the simulation times were longer.
Please see the uploaded pdf for further comments.

Citation: https://doi.org/10.5194/egusphere-2023-1596-RC1
RC2: 'Comment on egusphere-2023-1596', Chen Zhang, 08 Oct 2023

This paper presents a case study of the JULES land model while adapting the clustering approach to update the traditional grid approach. This effort is shown to improve the computational efficiency of the JULES model on the premise that the accuracy is similar to the two approaches. However, the HydroBlocks model and the cluster approach were clearly presented in Chancy et al. (2016) as the authors cited, the novelty of the JULES vn5.9 is supposed to be more stressed in Methods, or this work is more like an application of the HydroBlocks tool, which is a significant effort but doesn’t fall in the GMD’s Development and technical paper scope very well. I also think that the manuscript needs to be more structured to present a logical and complete work.
Below is a list of my detailed comments:

1. Detailed information on each set of simulations of this study is suggested (e.g. in section 3.1.1), which might include the value of the covariates, the variable N, and the grid resolution. For this, a table might be useful. The comparisons under different conditions will be clearer and more reliable, or the reader will be confused.

2. The authors mentioned “This indicates that a ten fold reduction in JULES compute expense can yield comparable results to the 1km gridded approach.” (L119-120) and “… the JULES regridded approach gives similar overall KGE results to the original gridded approach, while still benefiting from a ten fold reduction in compute resource.” (L204-205). What does the compute resource refer to in L205? Is the computational efficiency in this study evaluated by the data recourse, the grid and model setup recourse, or the running time of the model? This is the key point of this study and thus should be clarified or discussed. Then the ten fold computational efficiency improvement is more easily understood.

3. It is suggested to clarify the distribution characteristics of the soil moisture data because the KGE metric was built up based on the normal distribution.

4. The locations of the 26 gauges the authors mentioned in L168 are suggested to be mapped.

5. I do think that Introduction is more focused on the technique while lacking a view of the further influence of the tool and the meaning of the work to regulators and decision makers. The sentence in L52-54 seems like the conclusion of this study and is not recommended in this part. Moreover, the novelty of the JULES vn5.9 should be stressed in Methods.

6. The format of units should be uniform (italic or not). The units of the y-axis in most figures should be presented. The prat of the title in the Fig. 6 should be checked. The set 100LRU (L120) is not mentioned and there might be a writing mistake.

Citation: https://doi.org/10.5194/egusphere-2023-1596-RC2
AC1: 'Comment on egusphere-2023-1596', Elizabeth Cooper, 22 Nov 2023

We thank both reviewers for taking the time to provide comments and suggestions, which will help to improve our manuscript. Please see the uploaded supplement for our responses to both reviewer's comments.

Citation: https://doi.org/10.5194/egusphere-2023-1596-AC1

Status: closed

RC1: 'Comment on egusphere-2023-1596', Anonymous Referee #1, 12 Sep 2023

This paper presents a case study where the JULES land model is run with a vector-based spatial configuration, instead of the more typical grid-based setup. The main benefit of such an approach is that it can lead to computational efficiency, if grid cells with similar hydroclimatic behaviour can be grouped into a single computational unit. As the authors already acknowledge, this approach is not new (in addition to the already existing mention of Swenson et al., 2019, see also for example Gharari et al., 2020). The main novelties in this paper are application to the JULES model, as well as a comparison with observations of three soil moisture observation stations. I believe this is within scope of EGU journals, though HESS may be more appropriate than GMD because the tools the authors use, as well as the general concept, already exist.
I do think that the paper needs to be clarified in multiple places (see comments in the uploaded .pdf). Briefly, the paper currently assumes more background knowledge from the reader (both on terminology, as well as on the JULES model) than I think is appropriate. Additionally, methods need to be clarified in multiple places, with a particular focus on explaining to the reader why various choices are appropriate. I particularly want to highlight the choice to run JULES at a daily time step. This seems an uncommon choice for land surface models, and it is unclear to me to what extent results derived from a daily-step model configuration have practical relevance for JULES applications at sub-daily time steps. Given that these methodological choices underpin the analysis, I believe another round of reviews after these clarifications are made may be appropriate.
I further think that the presented analysis on clustering outcomes and similarity between vector-based and gridded setups (section 3.1.1) would benefit from extra analysis. Currently the reader is only presented with two sets of plots for a single time step (out of a 3-year run), and only the most minimal statistics for (I believe) a spatial comparison only. More in-depth analysis is needed to support the statements that:

- Land use and soil type are the most important covariants,

- 1000 vector-based clusters produce sufficiently similar results to a more high-resolution gridded setup
Finally, the main conclusion of the paper is that similar JULES performance can be obtained by using a clustering-based spatial discretization scheme as one can get with a traditional gridded setup. First, the paper focus heavily on aggregated efficiency metrics (KGE, NSE, MAE, etc) to compare the gridded and cluster-based setups. While these aggregated scores are indeed similar between both setups, the time series plot in the paper make it very clear that these similar scores are obtained as the result of very different internal model dynamics. I believe the conclusions need to be more nuanced to reflect this. Second and related, I believe that more investigation of these internal dynamics would strengthen the paper. This can involve more detailed analysis of model states and fluxes, as well as comparison to additional external data sources (such as ET) to determine if either of these two setups (gridded or cluster-based) is closer to reality - this would add a new dimension to the paper, in the sense that we would then better understand whether reducing the computational demand of running JULES also comes with a trade-off in model realism. Third (and this is more of a suggestion), the application domain of this test case seems somewhat small to me in both time and space. I believe the paper would be strengthened if the model domain would be larger and/or the simulation times were longer.
Please see the uploaded pdf for further comments.

Citation: https://doi.org/10.5194/egusphere-2023-1596-RC1
RC2: 'Comment on egusphere-2023-1596', Chen Zhang, 08 Oct 2023

This paper presents a case study of the JULES land model while adapting the clustering approach to update the traditional grid approach. This effort is shown to improve the computational efficiency of the JULES model on the premise that the accuracy is similar to the two approaches. However, the HydroBlocks model and the cluster approach were clearly presented in Chancy et al. (2016) as the authors cited, the novelty of the JULES vn5.9 is supposed to be more stressed in Methods, or this work is more like an application of the HydroBlocks tool, which is a significant effort but doesn’t fall in the GMD’s Development and technical paper scope very well. I also think that the manuscript needs to be more structured to present a logical and complete work.
Below is a list of my detailed comments:

1. Detailed information on each set of simulations of this study is suggested (e.g. in section 3.1.1), which might include the value of the covariates, the variable N, and the grid resolution. For this, a table might be useful. The comparisons under different conditions will be clearer and more reliable, or the reader will be confused.

2. The authors mentioned “This indicates that a ten fold reduction in JULES compute expense can yield comparable results to the 1km gridded approach.” (L119-120) and “… the JULES regridded approach gives similar overall KGE results to the original gridded approach, while still benefiting from a ten fold reduction in compute resource.” (L204-205). What does the compute resource refer to in L205? Is the computational efficiency in this study evaluated by the data recourse, the grid and model setup recourse, or the running time of the model? This is the key point of this study and thus should be clarified or discussed. Then the ten fold computational efficiency improvement is more easily understood.

3. It is suggested to clarify the distribution characteristics of the soil moisture data because the KGE metric was built up based on the normal distribution.

4. The locations of the 26 gauges the authors mentioned in L168 are suggested to be mapped.

5. I do think that Introduction is more focused on the technique while lacking a view of the further influence of the tool and the meaning of the work to regulators and decision makers. The sentence in L52-54 seems like the conclusion of this study and is not recommended in this part. Moreover, the novelty of the JULES vn5.9 should be stressed in Methods.

6. The format of units should be uniform (italic or not). The units of the y-axis in most figures should be presented. The prat of the title in the Fig. 6 should be checked. The set 100LRU (L120) is not mentioned and there might be a writing mistake.

Citation: https://doi.org/10.5194/egusphere-2023-1596-RC2
AC1: 'Comment on egusphere-2023-1596', Elizabeth Cooper, 22 Nov 2023

We thank both reviewers for taking the time to provide comments and suggestions, which will help to improve our manuscript. Please see the uploaded supplement for our responses to both reviewer's comments.

Citation: https://doi.org/10.5194/egusphere-2023-1596-AC1

Elizabeth Cooper, Rich Ellis, Eleanor Blyth, and Simon Dadson

Viewed

Total article views: 446 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
304	104	38	446	34	33

HTML: 304
PDF: 104
XML: 38
Total: 446
BibTeX: 34
EndNote: 33

Views and downloads (calculated since 10 Aug 2023)

Month	HTML	PDF	XML	Total
Aug 2023	79	25	4	108
Sep 2023	65	28	4	97
Oct 2023	43	13	5	61
Nov 2023	17	1	4	22
Dec 2023	15	7	3	25
Jan 2024	12	2	3	17
Feb 2024	19	7	0	26
Mar 2024	12	6	0	18
Apr 2024	17	3	2	22
May 2024	6	6	2	14
Jun 2024	10	5	4	19
Jul 2024	9	1	7	17

Cumulative views and downloads (calculated since 10 Aug 2023)

Month	HTML	PDF	XML	Total
Aug 2023	79	25	4	108
Sep 2023	65	28	4	97
Oct 2023	43	13	5	61
Nov 2023	17	1	4	22
Dec 2023	15	7	3	25
Jan 2024	12	2	3	17
Feb 2024	19	7	0	26
Mar 2024	12	6	0	18
Apr 2024	17	3	2	22
May 2024	6	6	2	14
Jun 2024	10	5	4	19
Jul 2024	9	1	7	17

Viewed (geographical distribution)

Total article views: 450 (including HTML, PDF, and XML) Thereof 450 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 26 Jul 2024

Short summary

We have tested a different way of simulating soil moisture and river flow. Instead of dividing the land up into over 10,000 squares to run our numerical model, we cluster the land into fewer, irregular areas with similar landscape characteristics. We show that different ways of clustering the landscape produce different patterns of soil moisture. We also show that with this method we can we match observations as well as our usual gridded approach for ten times less computational resource.


Total:	0
HTML:	0
PDF:	0
XML:	0