Ecosystem connections in the shelf sea environment using complex networks

Higgs, Ieuan; Skákala, Jozef; Bannister, Ross; Carrassi, Alberto; Ciavatta, Stefano

doi:10.5194/egusphere-2023-475

Preprints

https://doi.org/10.5194/egusphere-2023-475

Preprints

17 Apr 2023

| 17 Apr 2023

Ecosystem connections in the shelf sea environment using complex networks

Ieuan Higgs, Jozef Skákala, Ross Bannister, Alberto Carrassi, and Stefano Ciavatta

Abstract. We use complex network theory to better represent and understand the ecosystem connectivity in a shelf-sea environment. The baseline data used for the analysis are obtained from a state-of-the art coupled marine physics-biogeochemistry model simulating the North-West European Shelf (NWES). The complex network built on model outputs is used to identify the functional types of variables behind the biogeochemistry dynamics, suggesting how to simplify our understanding of the complex web of interactions within the shelf-sea ecosystem. We demonstrate that complex networks can be also used to understand spatial ecosystem connectivity, both identifying the (geographically varying) connectivity lengthscales and the clusters of spatial locations that are connected. These clusters indicate geographic regions where there is a substantial flow of information between the degrees of freedom within the ecosystem, while information exchange across the boundaries of these regions is limited. The results of this study help to understand how natural, or antrophogenic, perturbations propagate through the shelf-sea ecosystem, and can be used in multiple future applications such as stochastic noise modelling, data assimilation, or machine learning.

Received: 15 Mar 2023 – Discussion started: 17 Apr 2023

Competing interests: At least one of the (co-)authors is a member of the editorial board of Biogeosciences. The peer-review process was guided by an independent editor, and the authors also have no other competing interests to declare.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 1797 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (1797 KB)

Supplement (232 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

08 Feb 2024

Investigating ecosystem connections in the shelf sea environment using complex networks

Ieuan Higgs, Jozef Skákala, Ross Bannister, Alberto Carrassi, and Stefano Ciavatta

Biogeosciences, 21, 731–746, https://doi.org/10.5194/bg-21-731-2024,https://doi.org/10.5194/bg-21-731-2024, 2024

Short summary

Ieuan Higgs, Jozef Skákala, Ross Bannister, Alberto Carrassi, and Stefano Ciavatta

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-475', Anonymous Referee #1, 16 Jun 2023

This paper analyses output from a complex biogeochemical model, ERSEM, using network analysis. The analysis is used for several purposes: evaluating the spatial length scale of the variables, determining areas of coherent biogeochemical interactions and boundaries of low connectivity, and establishing which variabels are highly connected with each other. This information is useful when setting up regional systems. and evaluating the interactions between model variables and weather the system can be approximated well by a simpler representation. The length scales are useful in data assimilation systems, when setting the area of influence of the observations. I think the paper provide new knowledge worth publishing, but before I would like the following points addressed:

- Only surface data is used, this is reasonable to reduce the amount of data, but it would require a discussion of the implications of such a choice. For example, in the resulting network from the analysis (Figure 9) the detritus is completely disconnected from the photo and zooplankton, but as that quickly sinks out it would not remain one on the surface and maybe using only surface data is the reason for this disconnect? There is also a question wether there are other methods to reduce the data size that would retain more information throughout the water-column that could have been used?

- The longer time-scales are filtered out, so there could be biogeochemical feedback mechanisms that work on timescales >10 days that are filtered out. So what happens when resulting network is used to inform an emulator, and then applied in the context of climate as suggested by the authors? This also needs to be addressed in the discussion.

- Applicability of results: Would this results of the analysis be valid other models? For example could the length scales obtained be used in data assimilation system using another BGC model than ERSEM? Would the length scales apply when assimilation observations deeper in the water column even if your results that are only based on surface model data?

- The description of the methods could be improved for the benefit of the reader, I provide some suggestions for what needs to be clarified below.
Specific comments
Title: Could the title be improved but adding “Investigating” at the beginning?
Abstract:
The expression “functional types of variables” is used in the abstract and in the text, it is a bit unclear to me what this means. The expression becomes particularly confusing since the ERSEM itself also includes functional types of plankton. Consider either using a different expression or define it properly before using it.
“Be also used” should be “also be used”
What is meant by “flow of information between degree of freedom”
The first part of the last sentence is unclear to me: I don’t see that it is demonstrated anywhere how these results can be used to understand how a perturbation propagate through the ecosystem.
Line 38: “…investigate three relevant questions related …” either formulate the three topic as questions or rewrite the sentence on line 38.
Line 40: “based on” should be “apply”.
Line 40: Is this length scale only useful when applying variational data assimilation, not other (ensemble) data assimilation techniques?
Line 49: as mentioned before, the use of the expression the use of the expression ”functional type” is a bit confusing, please define it here.
Line 51: The statement that these traditional biogeochemical models are unsuitable to address response to climate change, effectively writing off all CMIP simulations is quite severe, I would suggest to moderate the statement. However I do agree that lighter model systems are more suitable for ensemble simulations, but it they are trained on data from the present day, they may not be very good at representing future ecosystem response.
Line 88: Were the river nutrients also included and were they also annual?
Line 120: the transformation to the time-local standardised form is very well explained, but I wonder what happens in period when standar deviation is low or zero (for example I winter), does and stay finite?
Line 120: Would river input influence the network results, for example would there be a stronger connection between the biogeochemistry and salinity in a region of strong river influence. I.e. would the network presented in figure 9 differ from region from region to region?
Line 124: I did not see it specified anywhere that data were treated any differently, so could you just simply write that all dat were treated this way?
Sections 3.2.1: Biogeochemical length scale estimation: What did you do in regions close to land or the boundary? Did you not compute the length scale or only consider the ocean points? The same question applies to the method in 3.2.2
Difference between method in 3.2.1 and 3.2.2: Am I correct that the difference between 3.2.1 and 3.2.2 is that 3.2.1 is done on a finer grid and uses a different method to compute the length scale? The coarsening before computing the length scale is primarily used to reduce the amount of data given to the SGC? Is this correct or are there other resort to compute length-scales twice? This could be made clear in the paper.
Line 153: How was the grid upscaled from 7 to 21 km?
Line 154-160 Explanation of pruning: This is very hard to understand, please explain better how this was done.
LIne 170 from “We took …” and the next sentence mean exactly the same thing. Remove the first sentence (or last, up the author, but I preferred the last).
Before line 180: This is not easy to understand, could you please try to make this clearer:

“This was done by taking the mean lengthscale at each grid point across all variables from the dynamically thresholded spatial networks. In order to assess whether this spatial variation could be well approximated by the mean of these lengthscales, we compared the spatial distribution of lengthscales between each different variable using Pearson’s correlation. Here, we would expect to see a high correlation if the structure of the spatially varying lengthscales is

consistent. This set of spatially varying lengthscales was then represented as a ratio of the mean.”
Line 193: “a links … defined by the Spearman correlation.. ” at this point there has been introduces severe spearman correlation, the length scale of the correlation with itself on a 7 km grid, the length-scale om a 21 km grid and the correlations between the length scales of different variables, so which one does this refer to here?
Paragraph line 190-200: Please write out the equations on its own line (as on page 7) and give them numbers to benefit the reader.
Line 220: This is difficult to follow: “In order to compare the regionalisation of each variable, we first projected the cluster labels of each node back onto the horizontal plane. Then, we applied an edge detection kernel to identify the boundaries between differently labelled regions, creating a boundary map for each variable (with value 1 at boundary grid points and 0 elsewhere).” Please refer back to the appropriate equation on the previous page (ref. my comment above).
Line 235: You calculate the mean adjacency matrix over 300 point randomly selected over the shelf <200 meter and then average that. Then later you say “the boundaries particularly seem to reflect shallower bathymetry (approx. 100 m) than the 200 m depth usually applied to delimit the margins

of shelf-seas, including NWES.” So why not samle within 100 meters?
Line 255: Be precise: inclusion of new types of observations *for assimilation* …

Line 255: I suggest to remove “profound”.
Line 265: suggest: “oxygen have different lengthscales …”
Line 379: “… we applied SGC…”: did you also test different values of k here?
Line 390: “Ammonium dynamics are relatively more complex than the ones of nitrate.” This sentence can be removed.
Figure 9: How was the lines connecting the different variables decided?
Line 427: I suggest to use another word than “dismantling”.
Concerning the supporting information, this would be easier to understand if the variables plotted were given standard names and the y-axis were supplied with the units.

Citation: https://doi.org/10.5194/egusphere-2023-475-RC1
- AC1: 'Reply on RC1', Ieuan Higgs, 21 Jul 2023
  
  Thank you for taking the time and care to provide valuable feedback and contributions to this manuscript. Please see our responses to the comments in the attached PDF, which we are ready to implement for a future revision.
  Best wishes,
  Ieuan Higgs and the co-authors
  
  Citation: https://doi.org/10.5194/egusphere-2023-475-AC1
RC2:
'Comment on egusphere-2023-475', Damien Couespel, 26 Jun 2023

Overview

In this paper, the authors use complex network theory with outputs from a model simulation of the North-West European Shelf (NWES) to identify 1) spatial correlation length scales of biogeochemical variables, 2) geographical regions with strong spatial correlation within them and weak correlation between them and 3) correlations between biogeochemical variables. Point 1) is achieved by computing the Spearman’s correlation coefficient between the time series of the different grid points. For point 2), for each variable, they build a spatial network with the previous coefficient, apply spectral graph clustering to gather grid-points and identify the boundaries of these clusters. Then, they define the regions base on the fraction of variables that have a boundary in each grid point. For point 3), they compute the Spearman’s correlation coefficient between the spatial distributions of each variable, build a spatial network with that and use the spectral graph clustering to cluster biogeochemical variables. A first result of this work is to show that complex network theory can be used to identify biogeochemical regions based on spatial correlation or to identify correlation between biogeochemical variables. This is of interest for reducing the complexity of biogeochemical dynamics and for helping the analysis of simulations. The correlation length scales are of interest for data assimilation as it quantify the range of the influence between grid points.
I very much appreciated to read the paper. It is clear and well written. The results are of interest and worth to be published. It presents an interesting way to analyse biogeochemical model outputs. The definition of biogeochemical provinces is particularly interesting as it can help the analysis of models. The methods are clearly explained. I do not have major comments on the paper, but rather a list of minor or specific comments that I think could further improve the paper. The comments that are more important are highlighted in red (see the pdf file attached for colored version).

As a summary of my comments, here are my answers to the review criteria at Biogeosciences. I just selected the relevant questions:

1. Do the authors give proper credit to related work and clearly indicate their own new/original contribution? Yes. Maybe a bit of comparison with the literature on correlation length scales could benefit the paper.

2. Does the abstract provide a concise and complete summary? Mostly. It could be improved by more clearly stating the results

Minor and specific comments

Abstract

I think the results should be more clearly/precisely stated in the abstract. It seemed a bit to vague to me. For example:

- l. 4: « to identify the functional types », which one are they exactly?

- l. 6: « identifying the (geographically varying) connectivity lengthscales and the clusters of spatial locations that are connected. » What are the main findings concerning the length scales? What are the different clusters? For the length scales, results that seems particularly interesting is that spatial variability is quite similar between variables, requiring only to scale it using the mean length.

- l. 9: « The results of this study help to understand how natural, or antrophogenic, perturbations propagate through the shelf-sea ecosystem », it is difficult to agree with that last sentence since the results where not clearly stated before. After finishing reading paper, I also do not think the results help to understand how perturbations propagate in the ecosystem. The results rather offer a analysis framework to do that.

- l. 9: « antrophogenic » -> anthropogenic

Introduction

l. 35: « an abstraction that will allow for smarter decision-making when considering data sampling and feature selection for ML. » Not that clear to me how and why abstraction can allow smarter decision-making.

l. 37-50: Very nice paragraph clearly stating the objective of the work. Model and Data

Sec. 2.1: I think it will be nice to have a bit more details about the configuration. Things like: numerical schemes, diffusion, viscosity, equation of state, what forcings (wind, temperature?). How the simulations are run (spin-up procedure, initialisation...). The reference to the papers should be for further details. The reader should not need to read these papers to get a basic understanding of the configuration.

Methodology

Sec. 3.1: maybe a figure showing the raw and filtered time series in the supplementaries could be useful to illustrate what are the timescale filtered? Or maybe some periodogram? It should probably be stated before (introduction? Or somewhere in the methods?) what are the timescales of interest? And why? Out of curiosity have you tried your analysis with the seasonal signal?

l. 154: « to a 21 km spatial resolution » make me wonder if the results are sensitive to the resolution of the model? Longer length scale because of eddy mixing? Or shorter one because of dynamical barrier created by filaments or eddies? This somehow questions also the isotropy assumption.

l. 162: I do not understand why the authors say : « As opposed to the biogeochemical lengthscales computed in Sect. 3.2.1 [...] here we manipulate the spatial networks to look at the spatial dependency of this length scale. » In section 3.2.1 you also have a map of the length scales that give you the spatial information (Fig. 2). I do not get the interest of these two definitions. Note that this also bring a bit of confusion about which are the length scales used for the different plots. For example in Fig. 4 which one is it? And for Fig. 5? I kind of got that Fig. 4 is the length scale define in sec. 3.2.1 and Fig. 5 the one in sec. 3.2.3 but it is not so clear.

l. 167: « black » rather than « red »?

Sec. 3.3: This part is not easy to follow. Maybe a short description of the objective at the beginning could help the reader. What are the objects to be clustered, following which criteria? If I understood well, the goal is to clusters grid-points depending on their temporal correlation between each other for each variables so that grid-points with strong correlation are group together.

Results and Discussion
Sec. 4.1: As mentioned before, mentioning which length scale (the one from sec. 3.2.1 or sec. 3.2.3) the authors refer to would help the reader. Since two definition of length scale seems to be used, it feels natural to wonder how they compare?

l. 275-278: I think I got the general idea here: the spatial distribution of the length scale of a specific variable is the product between Fig. 5a and Fig. 4. However, as it seems that it is not the same definition of the length scale between Fig. 4 and Fig. 5a it is a bit confusing.

Sec. 4.1: I am not familiar with length scale, but it seems that there is some literature on length scales (just saying that based on a quick search on google scholar). Some comparison of the results and the methods with the literature is missing there. Are there other definition of length scale? How does the method used in this paper compare with other? Are the length scales similar to former estimations?

Fig. 7: How is it done? I guess it is some kind of generalisation of Fig. 6 but it would be good to know more than « We used those robust boundaries to identify 13 regions representing areas of NWES connectivity. Results of this regionalisation are represented in Fig. 7. » (line 315)

l. 350: « or build simpler models than ERSEM » I think this need to be say a bit differently. Complexity of models tends to increase to better (or hoping to better) represent the real world. NPZD models already exist with just one phytoplankton, one zooplankton... Here the issue is to simplify ERSEM while keeping an accurate representation. Maybe something like line 51 « simplified (yet realistic with respect to the objectives) ».

l. 363-366: I do not see that in Fig. 8. The mean correlation between POM (yellow) and the Higher Trophic Levels + DOM (pink) is rather low. The authors should clarify.

Conclusions
l. 410-426: You are here a bit more specific about the results and this could be used for the abstract. E.g. « we can conclude that the biogeochemical lengthscales vary significantly between variables and are not directly transferable. » or « we have provided an approximation for the lengthscale of each variable, and each spatial location, that is informed by the high correlation in the spatial variability between lengthscales of each variable »...

l. 421-424: « Our analysis demonstrated that the chemical components (e.g., nitrogen, carbon, silicon. . . etc) of each pelagic variable (e.g., diatoms, nanophytoplankton, microzooplankton) are closely linked and a simpler version of the model can be built, by reducing these variables through parametrization. » I do not know ERSEM but I assume that as many models it started from a simple version and the complexity has been increased (e.g. addition of more phytoplankton types). I am wondering how the grouping compare with a former simpler version of ERSEM? I suppose it should be relatively similar (e.g. all types of phytoplankton in gather in only one) however it will be quite interesting if some grouping where different.
Extra comments
« lengthscales »: After a quick search on google scholar, it seems that it is rather written « length scales » or « length-scales ».

The regions define in Fig. 7 could be used for sampling the domain to analyse the inter-variable interaction network. Maybe selecting grid points only within one region and to compare with the same done with another region. Are the interaction between variables different between two regions? Or sampling evenly between the regions to have a fair general representation? This point is mostly for curiosity as it seems natural to try to use these regions.

l. 367: Butenschon et al. (2015) and Butenschon et al. (2016) are similar paper (2015 is the discussion version of 2016). Better to keep only 2016.

Citation: https://doi.org/10.5194/egusphere-2023-475-RC2
- AC2: 'Reply on RC2', Ieuan Higgs, 21 Jul 2023
  
  Thank you for taking the time and care to provide valuable feedback and contributions to this manuscript. Please see our responses to the comments in the attached PDF, which we are ready to implement for a future revision.
  Best wishes,
  Ieuan Higgs and the co-authors
  
  Citation: https://doi.org/10.5194/egusphere-2023-475-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-475', Anonymous Referee #1, 16 Jun 2023

This paper analyses output from a complex biogeochemical model, ERSEM, using network analysis. The analysis is used for several purposes: evaluating the spatial length scale of the variables, determining areas of coherent biogeochemical interactions and boundaries of low connectivity, and establishing which variabels are highly connected with each other. This information is useful when setting up regional systems. and evaluating the interactions between model variables and weather the system can be approximated well by a simpler representation. The length scales are useful in data assimilation systems, when setting the area of influence of the observations. I think the paper provide new knowledge worth publishing, but before I would like the following points addressed:

- Only surface data is used, this is reasonable to reduce the amount of data, but it would require a discussion of the implications of such a choice. For example, in the resulting network from the analysis (Figure 9) the detritus is completely disconnected from the photo and zooplankton, but as that quickly sinks out it would not remain one on the surface and maybe using only surface data is the reason for this disconnect? There is also a question wether there are other methods to reduce the data size that would retain more information throughout the water-column that could have been used?

- The longer time-scales are filtered out, so there could be biogeochemical feedback mechanisms that work on timescales >10 days that are filtered out. So what happens when resulting network is used to inform an emulator, and then applied in the context of climate as suggested by the authors? This also needs to be addressed in the discussion.

- Applicability of results: Would this results of the analysis be valid other models? For example could the length scales obtained be used in data assimilation system using another BGC model than ERSEM? Would the length scales apply when assimilation observations deeper in the water column even if your results that are only based on surface model data?

- The description of the methods could be improved for the benefit of the reader, I provide some suggestions for what needs to be clarified below.
Specific comments
Title: Could the title be improved but adding “Investigating” at the beginning?
Abstract:
The expression “functional types of variables” is used in the abstract and in the text, it is a bit unclear to me what this means. The expression becomes particularly confusing since the ERSEM itself also includes functional types of plankton. Consider either using a different expression or define it properly before using it.
“Be also used” should be “also be used”
What is meant by “flow of information between degree of freedom”
The first part of the last sentence is unclear to me: I don’t see that it is demonstrated anywhere how these results can be used to understand how a perturbation propagate through the ecosystem.
Line 38: “…investigate three relevant questions related …” either formulate the three topic as questions or rewrite the sentence on line 38.
Line 40: “based on” should be “apply”.
Line 40: Is this length scale only useful when applying variational data assimilation, not other (ensemble) data assimilation techniques?
Line 49: as mentioned before, the use of the expression the use of the expression ”functional type” is a bit confusing, please define it here.
Line 51: The statement that these traditional biogeochemical models are unsuitable to address response to climate change, effectively writing off all CMIP simulations is quite severe, I would suggest to moderate the statement. However I do agree that lighter model systems are more suitable for ensemble simulations, but it they are trained on data from the present day, they may not be very good at representing future ecosystem response.
Line 88: Were the river nutrients also included and were they also annual?
Line 120: the transformation to the time-local standardised form is very well explained, but I wonder what happens in period when standar deviation is low or zero (for example I winter), does and stay finite?
Line 120: Would river input influence the network results, for example would there be a stronger connection between the biogeochemistry and salinity in a region of strong river influence. I.e. would the network presented in figure 9 differ from region from region to region?
Line 124: I did not see it specified anywhere that data were treated any differently, so could you just simply write that all dat were treated this way?
Sections 3.2.1: Biogeochemical length scale estimation: What did you do in regions close to land or the boundary? Did you not compute the length scale or only consider the ocean points? The same question applies to the method in 3.2.2
Difference between method in 3.2.1 and 3.2.2: Am I correct that the difference between 3.2.1 and 3.2.2 is that 3.2.1 is done on a finer grid and uses a different method to compute the length scale? The coarsening before computing the length scale is primarily used to reduce the amount of data given to the SGC? Is this correct or are there other resort to compute length-scales twice? This could be made clear in the paper.
Line 153: How was the grid upscaled from 7 to 21 km?
Line 154-160 Explanation of pruning: This is very hard to understand, please explain better how this was done.
LIne 170 from “We took …” and the next sentence mean exactly the same thing. Remove the first sentence (or last, up the author, but I preferred the last).
Before line 180: This is not easy to understand, could you please try to make this clearer:

“This was done by taking the mean lengthscale at each grid point across all variables from the dynamically thresholded spatial networks. In order to assess whether this spatial variation could be well approximated by the mean of these lengthscales, we compared the spatial distribution of lengthscales between each different variable using Pearson’s correlation. Here, we would expect to see a high correlation if the structure of the spatially varying lengthscales is

consistent. This set of spatially varying lengthscales was then represented as a ratio of the mean.”
Line 193: “a links … defined by the Spearman correlation.. ” at this point there has been introduces severe spearman correlation, the length scale of the correlation with itself on a 7 km grid, the length-scale om a 21 km grid and the correlations between the length scales of different variables, so which one does this refer to here?
Paragraph line 190-200: Please write out the equations on its own line (as on page 7) and give them numbers to benefit the reader.
Line 220: This is difficult to follow: “In order to compare the regionalisation of each variable, we first projected the cluster labels of each node back onto the horizontal plane. Then, we applied an edge detection kernel to identify the boundaries between differently labelled regions, creating a boundary map for each variable (with value 1 at boundary grid points and 0 elsewhere).” Please refer back to the appropriate equation on the previous page (ref. my comment above).
Line 235: You calculate the mean adjacency matrix over 300 point randomly selected over the shelf <200 meter and then average that. Then later you say “the boundaries particularly seem to reflect shallower bathymetry (approx. 100 m) than the 200 m depth usually applied to delimit the margins

of shelf-seas, including NWES.” So why not samle within 100 meters?
Line 255: Be precise: inclusion of new types of observations *for assimilation* …

Line 255: I suggest to remove “profound”.
Line 265: suggest: “oxygen have different lengthscales …”
Line 379: “… we applied SGC…”: did you also test different values of k here?
Line 390: “Ammonium dynamics are relatively more complex than the ones of nitrate.” This sentence can be removed.
Figure 9: How was the lines connecting the different variables decided?
Line 427: I suggest to use another word than “dismantling”.
Concerning the supporting information, this would be easier to understand if the variables plotted were given standard names and the y-axis were supplied with the units.

Citation: https://doi.org/10.5194/egusphere-2023-475-RC1
- AC1: 'Reply on RC1', Ieuan Higgs, 21 Jul 2023
  
  Thank you for taking the time and care to provide valuable feedback and contributions to this manuscript. Please see our responses to the comments in the attached PDF, which we are ready to implement for a future revision.
  Best wishes,
  Ieuan Higgs and the co-authors
  
  Citation: https://doi.org/10.5194/egusphere-2023-475-AC1
RC2:
'Comment on egusphere-2023-475', Damien Couespel, 26 Jun 2023

Overview

In this paper, the authors use complex network theory with outputs from a model simulation of the North-West European Shelf (NWES) to identify 1) spatial correlation length scales of biogeochemical variables, 2) geographical regions with strong spatial correlation within them and weak correlation between them and 3) correlations between biogeochemical variables. Point 1) is achieved by computing the Spearman’s correlation coefficient between the time series of the different grid points. For point 2), for each variable, they build a spatial network with the previous coefficient, apply spectral graph clustering to gather grid-points and identify the boundaries of these clusters. Then, they define the regions base on the fraction of variables that have a boundary in each grid point. For point 3), they compute the Spearman’s correlation coefficient between the spatial distributions of each variable, build a spatial network with that and use the spectral graph clustering to cluster biogeochemical variables. A first result of this work is to show that complex network theory can be used to identify biogeochemical regions based on spatial correlation or to identify correlation between biogeochemical variables. This is of interest for reducing the complexity of biogeochemical dynamics and for helping the analysis of simulations. The correlation length scales are of interest for data assimilation as it quantify the range of the influence between grid points.
I very much appreciated to read the paper. It is clear and well written. The results are of interest and worth to be published. It presents an interesting way to analyse biogeochemical model outputs. The definition of biogeochemical provinces is particularly interesting as it can help the analysis of models. The methods are clearly explained. I do not have major comments on the paper, but rather a list of minor or specific comments that I think could further improve the paper. The comments that are more important are highlighted in red (see the pdf file attached for colored version).

As a summary of my comments, here are my answers to the review criteria at Biogeosciences. I just selected the relevant questions:

1. Do the authors give proper credit to related work and clearly indicate their own new/original contribution? Yes. Maybe a bit of comparison with the literature on correlation length scales could benefit the paper.

2. Does the abstract provide a concise and complete summary? Mostly. It could be improved by more clearly stating the results

Minor and specific comments

Abstract

I think the results should be more clearly/precisely stated in the abstract. It seemed a bit to vague to me. For example:

- l. 4: « to identify the functional types », which one are they exactly?

- l. 6: « identifying the (geographically varying) connectivity lengthscales and the clusters of spatial locations that are connected. » What are the main findings concerning the length scales? What are the different clusters? For the length scales, results that seems particularly interesting is that spatial variability is quite similar between variables, requiring only to scale it using the mean length.

- l. 9: « The results of this study help to understand how natural, or antrophogenic, perturbations propagate through the shelf-sea ecosystem », it is difficult to agree with that last sentence since the results where not clearly stated before. After finishing reading paper, I also do not think the results help to understand how perturbations propagate in the ecosystem. The results rather offer a analysis framework to do that.

- l. 9: « antrophogenic » -> anthropogenic

Introduction

l. 35: « an abstraction that will allow for smarter decision-making when considering data sampling and feature selection for ML. » Not that clear to me how and why abstraction can allow smarter decision-making.

l. 37-50: Very nice paragraph clearly stating the objective of the work. Model and Data

Sec. 2.1: I think it will be nice to have a bit more details about the configuration. Things like: numerical schemes, diffusion, viscosity, equation of state, what forcings (wind, temperature?). How the simulations are run (spin-up procedure, initialisation...). The reference to the papers should be for further details. The reader should not need to read these papers to get a basic understanding of the configuration.

Methodology

Sec. 3.1: maybe a figure showing the raw and filtered time series in the supplementaries could be useful to illustrate what are the timescale filtered? Or maybe some periodogram? It should probably be stated before (introduction? Or somewhere in the methods?) what are the timescales of interest? And why? Out of curiosity have you tried your analysis with the seasonal signal?

l. 154: « to a 21 km spatial resolution » make me wonder if the results are sensitive to the resolution of the model? Longer length scale because of eddy mixing? Or shorter one because of dynamical barrier created by filaments or eddies? This somehow questions also the isotropy assumption.

l. 162: I do not understand why the authors say : « As opposed to the biogeochemical lengthscales computed in Sect. 3.2.1 [...] here we manipulate the spatial networks to look at the spatial dependency of this length scale. » In section 3.2.1 you also have a map of the length scales that give you the spatial information (Fig. 2). I do not get the interest of these two definitions. Note that this also bring a bit of confusion about which are the length scales used for the different plots. For example in Fig. 4 which one is it? And for Fig. 5? I kind of got that Fig. 4 is the length scale define in sec. 3.2.1 and Fig. 5 the one in sec. 3.2.3 but it is not so clear.

l. 167: « black » rather than « red »?

Sec. 3.3: This part is not easy to follow. Maybe a short description of the objective at the beginning could help the reader. What are the objects to be clustered, following which criteria? If I understood well, the goal is to clusters grid-points depending on their temporal correlation between each other for each variables so that grid-points with strong correlation are group together.

Results and Discussion
Sec. 4.1: As mentioned before, mentioning which length scale (the one from sec. 3.2.1 or sec. 3.2.3) the authors refer to would help the reader. Since two definition of length scale seems to be used, it feels natural to wonder how they compare?

l. 275-278: I think I got the general idea here: the spatial distribution of the length scale of a specific variable is the product between Fig. 5a and Fig. 4. However, as it seems that it is not the same definition of the length scale between Fig. 4 and Fig. 5a it is a bit confusing.

Sec. 4.1: I am not familiar with length scale, but it seems that there is some literature on length scales (just saying that based on a quick search on google scholar). Some comparison of the results and the methods with the literature is missing there. Are there other definition of length scale? How does the method used in this paper compare with other? Are the length scales similar to former estimations?

Fig. 7: How is it done? I guess it is some kind of generalisation of Fig. 6 but it would be good to know more than « We used those robust boundaries to identify 13 regions representing areas of NWES connectivity. Results of this regionalisation are represented in Fig. 7. » (line 315)

l. 350: « or build simpler models than ERSEM » I think this need to be say a bit differently. Complexity of models tends to increase to better (or hoping to better) represent the real world. NPZD models already exist with just one phytoplankton, one zooplankton... Here the issue is to simplify ERSEM while keeping an accurate representation. Maybe something like line 51 « simplified (yet realistic with respect to the objectives) ».

l. 363-366: I do not see that in Fig. 8. The mean correlation between POM (yellow) and the Higher Trophic Levels + DOM (pink) is rather low. The authors should clarify.

Conclusions
l. 410-426: You are here a bit more specific about the results and this could be used for the abstract. E.g. « we can conclude that the biogeochemical lengthscales vary significantly between variables and are not directly transferable. » or « we have provided an approximation for the lengthscale of each variable, and each spatial location, that is informed by the high correlation in the spatial variability between lengthscales of each variable »...

l. 421-424: « Our analysis demonstrated that the chemical components (e.g., nitrogen, carbon, silicon. . . etc) of each pelagic variable (e.g., diatoms, nanophytoplankton, microzooplankton) are closely linked and a simpler version of the model can be built, by reducing these variables through parametrization. » I do not know ERSEM but I assume that as many models it started from a simple version and the complexity has been increased (e.g. addition of more phytoplankton types). I am wondering how the grouping compare with a former simpler version of ERSEM? I suppose it should be relatively similar (e.g. all types of phytoplankton in gather in only one) however it will be quite interesting if some grouping where different.
Extra comments
« lengthscales »: After a quick search on google scholar, it seems that it is rather written « length scales » or « length-scales ».

The regions define in Fig. 7 could be used for sampling the domain to analyse the inter-variable interaction network. Maybe selecting grid points only within one region and to compare with the same done with another region. Are the interaction between variables different between two regions? Or sampling evenly between the regions to have a fair general representation? This point is mostly for curiosity as it seems natural to try to use these regions.

l. 367: Butenschon et al. (2015) and Butenschon et al. (2016) are similar paper (2015 is the discussion version of 2016). Better to keep only 2016.

Citation: https://doi.org/10.5194/egusphere-2023-475-RC2
- AC2: 'Reply on RC2', Ieuan Higgs, 21 Jul 2023
  
  Thank you for taking the time and care to provide valuable feedback and contributions to this manuscript. Please see our responses to the comments in the attached PDF, which we are ready to implement for a future revision.
  Best wishes,
  Ieuan Higgs and the co-authors
  
  Citation: https://doi.org/10.5194/egusphere-2023-475-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Reconsider after major revisions (10 Aug 2023) by Marilaure Grégoire

AR by Ieuan Higgs on behalf of the Authors (21 Sep 2023) Author's response

EF by Vitaly Muravyev (04 Oct 2023) Manuscript Author's tracked changes

ED: Referee Nomination & Report Request started (09 Oct 2023) by Marilaure Grégoire

RR by Damien Couespel (14 Oct 2023)

RR by Anonymous Referee #1 (15 Nov 2023)

ED: Publish subject to minor revisions (review by editor) (17 Nov 2023) by Marilaure Grégoire

AR by Ieuan Higgs on behalf of the Authors (21 Nov 2023) Author's response Author's tracked changes Manuscript

ED: Publish as is (28 Nov 2023) by Marilaure Grégoire

AR by Ieuan Higgs on behalf of the Authors (05 Dec 2023)

Journal article(s) based on this preprint

08 Feb 2024

Investigating ecosystem connections in the shelf sea environment using complex networks

Ieuan Higgs, Jozef Skákala, Ross Bannister, Alberto Carrassi, and Stefano Ciavatta

Biogeosciences, 21, 731–746, https://doi.org/10.5194/bg-21-731-2024,https://doi.org/10.5194/bg-21-731-2024, 2024

Short summary

Ieuan Higgs, Jozef Skákala, Ross Bannister, Alberto Carrassi, and Stefano Ciavatta

Supplement

https://doi.org/10.5194/egusphere-2023-475-supplement

Ieuan Higgs, Jozef Skákala, Ross Bannister, Alberto Carrassi, and Stefano Ciavatta

Viewed

Total article views: 2,107 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
1,427	596	84	2,107	263	121	216

HTML: 1,427
PDF: 596
XML: 84
Total: 2,107
Supplement: 263
BibTeX: 121
EndNote: 216

Views and downloads (calculated since 17 Apr 2023)

Month	HTML	PDF	XML	Total
Apr 2023	69	28	5	102
May 2023	21	6	0	27
Jun 2023	44	10	5	59
Jul 2023	37	24	5	66
Aug 2023	17	6	1	24
Sep 2023	13	11	1	25
Oct 2023	17	17	2	36
Nov 2023	12	4	1	17
Dec 2023	24	12	3	39
Jan 2024	13	14	1	28
Feb 2024	5	4	2	11
Mar 2024	0
Apr 2024	0
May 2024	7	19	1	27
Jun 2024	15	18	5	38
Jul 2024	22	4	8	34
Aug 2024	36	2	8	46
Sep 2024	6	4	0	10
Oct 2024	6	20	0	26
Nov 2024	30	6	0	36
Dec 2024	10	6	0	16
Jan 2025	20	18	0	38
Feb 2025	34	14	0	48
Mar 2025	36	18	2	56
Apr 2025	42	14	6	62
May 2025	26	24	0	50
Jun 2025	18	28	0	46
Jul 2025	14	30	4	48
Aug 2025	62	36	4	102
Sep 2025	298	26	0	324
Oct 2025	44	34	0	78
Nov 2025	58	34	0	92
Dec 2025	48	14	2	64
Jan 2026	54	18	8	80
Feb 2026	76	14	4	94
Mar 2026	66	38	2	106
Apr 2026	52	10	1	63
May 2026	70	6	2	78
Jun 2026	1	5	0	6
Jul 2026	4	1	5

Cumulative views and downloads (calculated since 17 Apr 2023)

Month	HTML	PDF	XML	Total
Apr 2023	69	28	5	102
May 2023	21	6	0	27
Jun 2023	44	10	5	59
Jul 2023	37	24	5	66
Aug 2023	17	6	1	24
Sep 2023	13	11	1	25
Oct 2023	17	17	2	36
Nov 2023	12	4	1	17
Dec 2023	24	12	3	39
Jan 2024	13	14	1	28
Feb 2024	5	4	2	11
Mar 2024	0
Apr 2024	0
May 2024	7	19	1	27
Jun 2024	15	18	5	38
Jul 2024	22	4	8	34
Aug 2024	36	2	8	46
Sep 2024	6	4	0	10
Oct 2024	6	20	0	26
Nov 2024	30	6	0	36
Dec 2024	10	6	0	16
Jan 2025	20	18	0	38
Feb 2025	34	14	0	48
Mar 2025	36	18	2	56
Apr 2025	42	14	6	62
May 2025	26	24	0	50
Jun 2025	18	28	0	46
Jul 2025	14	30	4	48
Aug 2025	62	36	4	102
Sep 2025	298	26	0	324
Oct 2025	44	34	0	78
Nov 2025	58	34	0	92
Dec 2025	48	14	2	64
Jan 2026	54	18	8	80
Feb 2026	76	14	4	94
Mar 2026	66	38	2	106
Apr 2026	52	10	1	63
May 2026	70	6	2	78
Jun 2026	1	5	0	6
Jul 2026	4	1	5

Viewed (geographical distribution)

Total article views: 2,094 (including HTML, PDF, and XML) Thereof 2,094 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 19 Jul 2026

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (1797 KB)
Metadata XML

Short summary

A complex network is a way of representing which parts of a system are connected to other parts. We have constructed a complex network based on a ecosystem-ocean model. From this, we can identify patterns in the structure and areas of similar behaviour. This can help to understand how natural, or human-made, changes will effect the shelf-sea ecosystem, and can be used in multiple future applications such as improving modelling, data assimilation, or machine learning.

Ecosystem connections in the shelf sea environment using complex networks

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection

Suggestions for revision or reasons for rejection

Journal article(s) based on this preprint

Supplement

Viewed

Viewed (geographical distribution)


Total:	0
HTML:	0
PDF:	0
XML:	0