the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Ecosystem connections in the shelf sea environment using complex networks
Abstract. We use complex network theory to better represent and understand the ecosystem connectivity in a shelfsea environment. The baseline data used for the analysis are obtained from a stateofthe art coupled marine physicsbiogeochemistry model simulating the NorthWest European Shelf (NWES). The complex network built on model outputs is used to identify the functional types of variables behind the biogeochemistry dynamics, suggesting how to simplify our understanding of the complex web of interactions within the shelfsea ecosystem. We demonstrate that complex networks can be also used to understand spatial ecosystem connectivity, both identifying the (geographically varying) connectivity lengthscales and the clusters of spatial locations that are connected. These clusters indicate geographic regions where there is a substantial flow of information between the degrees of freedom within the ecosystem, while information exchange across the boundaries of these regions is limited. The results of this study help to understand how natural, or antrophogenic, perturbations propagate through the shelfsea ecosystem, and can be used in multiple future applications such as stochastic noise modelling, data assimilation, or machine learning.

Notice on discussion status
The requested preprint has a corresponding peerreviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint
(1797 KB)

Supplement
(232 KB)

The requested preprint has a corresponding peerreviewed final revised paper. You are encouraged to refer to the final revised version.
 Preprint
(1797 KB)  Metadata XML

Supplement
(232 KB)  BibTeX
 EndNote
 Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed

RC1: 'Comment on egusphere2023475', Anonymous Referee #1, 16 Jun 2023
This paper analyses output from a complex biogeochemical model, ERSEM, using network analysis. The analysis is used for several purposes: evaluating the spatial length scale of the variables, determining areas of coherent biogeochemical interactions and boundaries of low connectivity, and establishing which variabels are highly connected with each other. This information is useful when setting up regional systems. and evaluating the interactions between model variables and weather the system can be approximated well by a simpler representation. The length scales are useful in data assimilation systems, when setting the area of influence of the observations. I think the paper provide new knowledge worth publishing, but before I would like the following points addressed:
 Only surface data is used, this is reasonable to reduce the amount of data, but it would require a discussion of the implications of such a choice. For example, in the resulting network from the analysis (Figure 9) the detritus is completely disconnected from the photo and zooplankton, but as that quickly sinks out it would not remain one on the surface and maybe using only surface data is the reason for this disconnect? There is also a question wether there are other methods to reduce the data size that would retain more information throughout the watercolumn that could have been used?
 The longer timescales are filtered out, so there could be biogeochemical feedback mechanisms that work on timescales >10 days that are filtered out. So what happens when resulting network is used to inform an emulator, and then applied in the context of climate as suggested by the authors? This also needs to be addressed in the discussion.
 Applicability of results: Would this results of the analysis be valid other models? For example could the length scales obtained be used in data assimilation system using another BGC model than ERSEM? Would the length scales apply when assimilation observations deeper in the water column even if your results that are only based on surface model data?
 The description of the methods could be improved for the benefit of the reader, I provide some suggestions for what needs to be clarified below.Specific comments
Title: Could the title be improved but adding “Investigating” at the beginning?
Abstract:
The expression “functional types of variables” is used in the abstract and in the text, it is a bit unclear to me what this means. The expression becomes particularly confusing since the ERSEM itself also includes functional types of plankton. Consider either using a different expression or define it properly before using it.
“Be also used” should be “also be used”
What is meant by “flow of information between degree of freedom”
The first part of the last sentence is unclear to me: I don’t see that it is demonstrated anywhere how these results can be used to understand how a perturbation propagate through the ecosystem.
Line 38: “…investigate three relevant questions related …” either formulate the three topic as questions or rewrite the sentence on line 38.
Line 40: “based on” should be “apply”.
Line 40: Is this length scale only useful when applying variational data assimilation, not other (ensemble) data assimilation techniques?
Line 49: as mentioned before, the use of the expression the use of the expression ”functional type” is a bit confusing, please define it here.
Line 51: The statement that these traditional biogeochemical models are unsuitable to address response to climate change, effectively writing off all CMIP simulations is quite severe, I would suggest to moderate the statement. However I do agree that lighter model systems are more suitable for ensemble simulations, but it they are trained on data from the present day, they may not be very good at representing future ecosystem response.
Line 88: Were the river nutrients also included and were they also annual?
Line 120: the transformation to the timelocal standardised form is very well explained, but I wonder what happens in period when standar deviation is low or zero (for example I winter), does and stay finite?
Line 120: Would river input influence the network results, for example would there be a stronger connection between the biogeochemistry and salinity in a region of strong river influence. I.e. would the network presented in figure 9 differ from region from region to region?
Line 124: I did not see it specified anywhere that data were treated any differently, so could you just simply write that all dat were treated this way?
Sections 3.2.1: Biogeochemical length scale estimation: What did you do in regions close to land or the boundary? Did you not compute the length scale or only consider the ocean points? The same question applies to the method in 3.2.2
Difference between method in 3.2.1 and 3.2.2: Am I correct that the difference between 3.2.1 and 3.2.2 is that 3.2.1 is done on a finer grid and uses a different method to compute the length scale? The coarsening before computing the length scale is primarily used to reduce the amount of data given to the SGC? Is this correct or are there other resort to compute lengthscales twice? This could be made clear in the paper.
Line 153: How was the grid upscaled from 7 to 21 km?
Line 154160 Explanation of pruning: This is very hard to understand, please explain better how this was done.
LIne 170 from “We took …” and the next sentence mean exactly the same thing. Remove the first sentence (or last, up the author, but I preferred the last).
Before line 180: This is not easy to understand, could you please try to make this clearer:
“This was done by taking the mean lengthscale at each grid point across all variables from the dynamically thresholded spatial networks. In order to assess whether this spatial variation could be well approximated by the mean of these lengthscales, we compared the spatial distribution of lengthscales between each different variable using Pearson’s correlation. Here, we would expect to see a high correlation if the structure of the spatially varying lengthscales is
consistent. This set of spatially varying lengthscales was then represented as a ratio of the mean.”Line 193: “a links … defined by the Spearman correlation.. ” at this point there has been introduces severe spearman correlation, the length scale of the correlation with itself on a 7 km grid, the lengthscale om a 21 km grid and the correlations between the length scales of different variables, so which one does this refer to here?
Paragraph line 190200: Please write out the equations on its own line (as on page 7) and give them numbers to benefit the reader.
Line 220: This is difficult to follow: “In order to compare the regionalisation of each variable, we first projected the cluster labels of each node back onto the horizontal plane. Then, we applied an edge detection kernel to identify the boundaries between differently labelled regions, creating a boundary map for each variable (with value 1 at boundary grid points and 0 elsewhere).” Please refer back to the appropriate equation on the previous page (ref. my comment above).
Line 235: You calculate the mean adjacency matrix over 300 point randomly selected over the shelf <200 meter and then average that. Then later you say “the boundaries particularly seem to reflect shallower bathymetry (approx. 100 m) than the 200 m depth usually applied to delimit the margins
of shelfseas, including NWES.” So why not samle within 100 meters?Line 255: Be precise: inclusion of new types of observations *for assimilation* …
Line 255: I suggest to remove “profound”.Line 265: suggest: “oxygen have different lengthscales …”
Line 379: “… we applied SGC…”: did you also test different values of k here?
Line 390: “Ammonium dynamics are relatively more complex than the ones of nitrate.” This sentence can be removed.
Figure 9: How was the lines connecting the different variables decided?
Line 427: I suggest to use another word than “dismantling”.
Concerning the supporting information, this would be easier to understand if the variables plotted were given standard names and the yaxis were supplied with the units.
Citation: https://doi.org/10.5194/egusphere2023475RC1 
AC1: 'Reply on RC1', Ieuan Higgs, 21 Jul 2023
Thank you for taking the time and care to provide valuable feedback and contributions to this manuscript. Please see our responses to the comments in the attached PDF, which we are ready to implement for a future revision.
Best wishes,
Ieuan Higgs and the coauthors

AC1: 'Reply on RC1', Ieuan Higgs, 21 Jul 2023

RC2: 'Comment on egusphere2023475', Damien Couespel, 26 Jun 2023
Overview
In this paper, the authors use complex network theory with outputs from a model simulation of the NorthWest European Shelf (NWES) to identify 1) spatial correlation length scales of biogeochemical variables, 2) geographical regions with strong spatial correlation within them and weak correlation between them and 3) correlations between biogeochemical variables. Point 1) is achieved by computing the Spearman’s correlation coefficient between the time series of the different grid points. For point 2), for each variable, they build a spatial network with the previous coefficient, apply spectral graph clustering to gather gridpoints and identify the boundaries of these clusters. Then, they define the regions base on the fraction of variables that have a boundary in each grid point. For point 3), they compute the Spearman’s correlation coefficient between the spatial distributions of each variable, build a spatial network with that and use the spectral graph clustering to cluster biogeochemical variables. A first result of this work is to show that complex network theory can be used to identify biogeochemical regions based on spatial correlation or to identify correlation between biogeochemical variables. This is of interest for reducing the complexity of biogeochemical dynamics and for helping the analysis of simulations. The correlation length scales are of interest for data assimilation as it quantify the range of the influence between grid points.I very much appreciated to read the paper. It is clear and well written. The results are of interest and worth to be published. It presents an interesting way to analyse biogeochemical model outputs. The definition of biogeochemical provinces is particularly interesting as it can help the analysis of models. The methods are clearly explained. I do not have major comments on the paper, but rather a list of minor or specific comments that I think could further improve the paper. The comments that are more important are highlighted in red (see the pdf file attached for colored version).
As a summary of my comments, here are my answers to the review criteria at Biogeosciences. I just selected the relevant questions:
1. Do the authors give proper credit to related work and clearly indicate their own new/original contribution? Yes. Maybe a bit of comparison with the literature on correlation length scales could benefit the paper.
2. Does the abstract provide a concise and complete summary? Mostly. It could be improved by more clearly stating the results
Minor and specific comments
Abstract
I think the results should be more clearly/precisely stated in the abstract. It seemed a bit to vague to me. For example:
 l. 4: « to identify the functional types », which one are they exactly?
 l. 6: « identifying the (geographically varying) connectivity lengthscales and the clusters of spatial locations that are connected. » What are the main findings concerning the length scales? What are the different clusters? For the length scales, results that seems particularly interesting is that spatial variability is quite similar between variables, requiring only to scale it using the mean length.
 l. 9: « The results of this study help to understand how natural, or antrophogenic, perturbations propagate through the shelfsea ecosystem », it is difficult to agree with that last sentence since the results where not clearly stated before. After finishing reading paper, I also do not think the results help to understand how perturbations propagate in the ecosystem. The results rather offer a analysis framework to do that.
 l. 9: « antrophogenic » > anthropogenic
Introduction
l. 35: « an abstraction that will allow for smarter decisionmaking when considering data sampling and feature selection for ML. » Not that clear to me how and why abstraction can allow smarter decisionmaking.
l. 3750: Very nice paragraph clearly stating the objective of the work. Model and Data
Sec. 2.1: I think it will be nice to have a bit more details about the configuration. Things like: numerical schemes, diffusion, viscosity, equation of state, what forcings (wind, temperature?). How the simulations are run (spinup procedure, initialisation...). The reference to the papers should be for further details. The reader should not need to read these papers to get a basic understanding of the configuration.
Methodology
Sec. 3.1: maybe a figure showing the raw and filtered time series in the supplementaries could be useful to illustrate what are the timescale filtered? Or maybe some periodogram? It should probably be stated before (introduction? Or somewhere in the methods?) what are the timescales of interest? And why? Out of curiosity have you tried your analysis with the seasonal signal?
l. 154: « to a 21 km spatial resolution » make me wonder if the results are sensitive to the resolution of the model? Longer length scale because of eddy mixing? Or shorter one because of dynamical barrier created by filaments or eddies? This somehow questions also the isotropy assumption.
l. 162: I do not understand why the authors say : « As opposed to the biogeochemical lengthscales computed in Sect. 3.2.1 [...] here we manipulate the spatial networks to look at the spatial dependency of this length scale. » In section 3.2.1 you also have a map of the length scales that give you the spatial information (Fig. 2). I do not get the interest of these two definitions. Note that this also bring a bit of confusion about which are the length scales used for the different plots. For example in Fig. 4 which one is it? And for Fig. 5? I kind of got that Fig. 4 is the length scale define in sec. 3.2.1 and Fig. 5 the one in sec. 3.2.3 but it is not so clear.
l. 167: « black » rather than « red »?
Sec. 3.3: This part is not easy to follow. Maybe a short description of the objective at the beginning could help the reader. What are the objects to be clustered, following which criteria? If I understood well, the goal is to clusters gridpoints depending on their temporal correlation between each other for each variables so that gridpoints with strong correlation are group together.
Results and DiscussionSec. 4.1: As mentioned before, mentioning which length scale (the one from sec. 3.2.1 or sec. 3.2.3) the authors refer to would help the reader. Since two definition of length scale seems to be used, it feels natural to wonder how they compare?
l. 275278: I think I got the general idea here: the spatial distribution of the length scale of a specific variable is the product between Fig. 5a and Fig. 4. However, as it seems that it is not the same definition of the length scale between Fig. 4 and Fig. 5a it is a bit confusing.
Sec. 4.1: I am not familiar with length scale, but it seems that there is some literature on length scales (just saying that based on a quick search on google scholar). Some comparison of the results and the methods with the literature is missing there. Are there other definition of length scale? How does the method used in this paper compare with other? Are the length scales similar to former estimations?
Fig. 7: How is it done? I guess it is some kind of generalisation of Fig. 6 but it would be good to know more than « We used those robust boundaries to identify 13 regions representing areas of NWES connectivity. Results of this regionalisation are represented in Fig. 7. » (line 315)
l. 350: « or build simpler models than ERSEM » I think this need to be say a bit differently. Complexity of models tends to increase to better (or hoping to better) represent the real world. NPZD models already exist with just one phytoplankton, one zooplankton... Here the issue is to simplify ERSEM while keeping an accurate representation. Maybe something like line 51 « simplified (yet realistic with respect to the objectives) ».
l. 363366: I do not see that in Fig. 8. The mean correlation between POM (yellow) and the Higher Trophic Levels + DOM (pink) is rather low. The authors should clarify.
Conclusionsl. 410426: You are here a bit more specific about the results and this could be used for the abstract. E.g. « we can conclude that the biogeochemical lengthscales vary significantly between variables and are not directly transferable. » or « we have provided an approximation for the lengthscale of each variable, and each spatial location, that is informed by the high correlation in the spatial variability between lengthscales of each variable »...
l. 421424: « Our analysis demonstrated that the chemical components (e.g., nitrogen, carbon, silicon. . . etc) of each pelagic variable (e.g., diatoms, nanophytoplankton, microzooplankton) are closely linked and a simpler version of the model can be built, by reducing these variables through parametrization. » I do not know ERSEM but I assume that as many models it started from a simple version and the complexity has been increased (e.g. addition of more phytoplankton types). I am wondering how the grouping compare with a former simpler version of ERSEM? I suppose it should be relatively similar (e.g. all types of phytoplankton in gather in only one) however it will be quite interesting if some grouping where different.Extra comments
« lengthscales »: After a quick search on google scholar, it seems that it is rather written « length scales » or « lengthscales ».
The regions define in Fig. 7 could be used for sampling the domain to analyse the intervariable interaction network. Maybe selecting grid points only within one region and to compare with the same done with another region. Are the interaction between variables different between two regions? Or sampling evenly between the regions to have a fair general representation? This point is mostly for curiosity as it seems natural to try to use these regions.
l. 367: Butenschon et al. (2015) and Butenschon et al. (2016) are similar paper (2015 is the discussion version of 2016). Better to keep only 2016.

AC2: 'Reply on RC2', Ieuan Higgs, 21 Jul 2023
Thank you for taking the time and care to provide valuable feedback and contributions to this manuscript. Please see our responses to the comments in the attached PDF, which we are ready to implement for a future revision.
Best wishes,
Ieuan Higgs and the coauthors

AC2: 'Reply on RC2', Ieuan Higgs, 21 Jul 2023
Interactive discussion
Status: closed

RC1: 'Comment on egusphere2023475', Anonymous Referee #1, 16 Jun 2023
This paper analyses output from a complex biogeochemical model, ERSEM, using network analysis. The analysis is used for several purposes: evaluating the spatial length scale of the variables, determining areas of coherent biogeochemical interactions and boundaries of low connectivity, and establishing which variabels are highly connected with each other. This information is useful when setting up regional systems. and evaluating the interactions between model variables and weather the system can be approximated well by a simpler representation. The length scales are useful in data assimilation systems, when setting the area of influence of the observations. I think the paper provide new knowledge worth publishing, but before I would like the following points addressed:
 Only surface data is used, this is reasonable to reduce the amount of data, but it would require a discussion of the implications of such a choice. For example, in the resulting network from the analysis (Figure 9) the detritus is completely disconnected from the photo and zooplankton, but as that quickly sinks out it would not remain one on the surface and maybe using only surface data is the reason for this disconnect? There is also a question wether there are other methods to reduce the data size that would retain more information throughout the watercolumn that could have been used?
 The longer timescales are filtered out, so there could be biogeochemical feedback mechanisms that work on timescales >10 days that are filtered out. So what happens when resulting network is used to inform an emulator, and then applied in the context of climate as suggested by the authors? This also needs to be addressed in the discussion.
 Applicability of results: Would this results of the analysis be valid other models? For example could the length scales obtained be used in data assimilation system using another BGC model than ERSEM? Would the length scales apply when assimilation observations deeper in the water column even if your results that are only based on surface model data?
 The description of the methods could be improved for the benefit of the reader, I provide some suggestions for what needs to be clarified below.Specific comments
Title: Could the title be improved but adding “Investigating” at the beginning?
Abstract:
The expression “functional types of variables” is used in the abstract and in the text, it is a bit unclear to me what this means. The expression becomes particularly confusing since the ERSEM itself also includes functional types of plankton. Consider either using a different expression or define it properly before using it.
“Be also used” should be “also be used”
What is meant by “flow of information between degree of freedom”
The first part of the last sentence is unclear to me: I don’t see that it is demonstrated anywhere how these results can be used to understand how a perturbation propagate through the ecosystem.
Line 38: “…investigate three relevant questions related …” either formulate the three topic as questions or rewrite the sentence on line 38.
Line 40: “based on” should be “apply”.
Line 40: Is this length scale only useful when applying variational data assimilation, not other (ensemble) data assimilation techniques?
Line 49: as mentioned before, the use of the expression the use of the expression ”functional type” is a bit confusing, please define it here.
Line 51: The statement that these traditional biogeochemical models are unsuitable to address response to climate change, effectively writing off all CMIP simulations is quite severe, I would suggest to moderate the statement. However I do agree that lighter model systems are more suitable for ensemble simulations, but it they are trained on data from the present day, they may not be very good at representing future ecosystem response.
Line 88: Were the river nutrients also included and were they also annual?
Line 120: the transformation to the timelocal standardised form is very well explained, but I wonder what happens in period when standar deviation is low or zero (for example I winter), does and stay finite?
Line 120: Would river input influence the network results, for example would there be a stronger connection between the biogeochemistry and salinity in a region of strong river influence. I.e. would the network presented in figure 9 differ from region from region to region?
Line 124: I did not see it specified anywhere that data were treated any differently, so could you just simply write that all dat were treated this way?
Sections 3.2.1: Biogeochemical length scale estimation: What did you do in regions close to land or the boundary? Did you not compute the length scale or only consider the ocean points? The same question applies to the method in 3.2.2
Difference between method in 3.2.1 and 3.2.2: Am I correct that the difference between 3.2.1 and 3.2.2 is that 3.2.1 is done on a finer grid and uses a different method to compute the length scale? The coarsening before computing the length scale is primarily used to reduce the amount of data given to the SGC? Is this correct or are there other resort to compute lengthscales twice? This could be made clear in the paper.
Line 153: How was the grid upscaled from 7 to 21 km?
Line 154160 Explanation of pruning: This is very hard to understand, please explain better how this was done.
LIne 170 from “We took …” and the next sentence mean exactly the same thing. Remove the first sentence (or last, up the author, but I preferred the last).
Before line 180: This is not easy to understand, could you please try to make this clearer:
“This was done by taking the mean lengthscale at each grid point across all variables from the dynamically thresholded spatial networks. In order to assess whether this spatial variation could be well approximated by the mean of these lengthscales, we compared the spatial distribution of lengthscales between each different variable using Pearson’s correlation. Here, we would expect to see a high correlation if the structure of the spatially varying lengthscales is
consistent. This set of spatially varying lengthscales was then represented as a ratio of the mean.”Line 193: “a links … defined by the Spearman correlation.. ” at this point there has been introduces severe spearman correlation, the length scale of the correlation with itself on a 7 km grid, the lengthscale om a 21 km grid and the correlations between the length scales of different variables, so which one does this refer to here?
Paragraph line 190200: Please write out the equations on its own line (as on page 7) and give them numbers to benefit the reader.
Line 220: This is difficult to follow: “In order to compare the regionalisation of each variable, we first projected the cluster labels of each node back onto the horizontal plane. Then, we applied an edge detection kernel to identify the boundaries between differently labelled regions, creating a boundary map for each variable (with value 1 at boundary grid points and 0 elsewhere).” Please refer back to the appropriate equation on the previous page (ref. my comment above).
Line 235: You calculate the mean adjacency matrix over 300 point randomly selected over the shelf <200 meter and then average that. Then later you say “the boundaries particularly seem to reflect shallower bathymetry (approx. 100 m) than the 200 m depth usually applied to delimit the margins
of shelfseas, including NWES.” So why not samle within 100 meters?Line 255: Be precise: inclusion of new types of observations *for assimilation* …
Line 255: I suggest to remove “profound”.Line 265: suggest: “oxygen have different lengthscales …”
Line 379: “… we applied SGC…”: did you also test different values of k here?
Line 390: “Ammonium dynamics are relatively more complex than the ones of nitrate.” This sentence can be removed.
Figure 9: How was the lines connecting the different variables decided?
Line 427: I suggest to use another word than “dismantling”.
Concerning the supporting information, this would be easier to understand if the variables plotted were given standard names and the yaxis were supplied with the units.
Citation: https://doi.org/10.5194/egusphere2023475RC1 
AC1: 'Reply on RC1', Ieuan Higgs, 21 Jul 2023
Thank you for taking the time and care to provide valuable feedback and contributions to this manuscript. Please see our responses to the comments in the attached PDF, which we are ready to implement for a future revision.
Best wishes,
Ieuan Higgs and the coauthors

AC1: 'Reply on RC1', Ieuan Higgs, 21 Jul 2023

RC2: 'Comment on egusphere2023475', Damien Couespel, 26 Jun 2023
Overview
In this paper, the authors use complex network theory with outputs from a model simulation of the NorthWest European Shelf (NWES) to identify 1) spatial correlation length scales of biogeochemical variables, 2) geographical regions with strong spatial correlation within them and weak correlation between them and 3) correlations between biogeochemical variables. Point 1) is achieved by computing the Spearman’s correlation coefficient between the time series of the different grid points. For point 2), for each variable, they build a spatial network with the previous coefficient, apply spectral graph clustering to gather gridpoints and identify the boundaries of these clusters. Then, they define the regions base on the fraction of variables that have a boundary in each grid point. For point 3), they compute the Spearman’s correlation coefficient between the spatial distributions of each variable, build a spatial network with that and use the spectral graph clustering to cluster biogeochemical variables. A first result of this work is to show that complex network theory can be used to identify biogeochemical regions based on spatial correlation or to identify correlation between biogeochemical variables. This is of interest for reducing the complexity of biogeochemical dynamics and for helping the analysis of simulations. The correlation length scales are of interest for data assimilation as it quantify the range of the influence between grid points.I very much appreciated to read the paper. It is clear and well written. The results are of interest and worth to be published. It presents an interesting way to analyse biogeochemical model outputs. The definition of biogeochemical provinces is particularly interesting as it can help the analysis of models. The methods are clearly explained. I do not have major comments on the paper, but rather a list of minor or specific comments that I think could further improve the paper. The comments that are more important are highlighted in red (see the pdf file attached for colored version).
As a summary of my comments, here are my answers to the review criteria at Biogeosciences. I just selected the relevant questions:
1. Do the authors give proper credit to related work and clearly indicate their own new/original contribution? Yes. Maybe a bit of comparison with the literature on correlation length scales could benefit the paper.
2. Does the abstract provide a concise and complete summary? Mostly. It could be improved by more clearly stating the results
Minor and specific comments
Abstract
I think the results should be more clearly/precisely stated in the abstract. It seemed a bit to vague to me. For example:
 l. 4: « to identify the functional types », which one are they exactly?
 l. 6: « identifying the (geographically varying) connectivity lengthscales and the clusters of spatial locations that are connected. » What are the main findings concerning the length scales? What are the different clusters? For the length scales, results that seems particularly interesting is that spatial variability is quite similar between variables, requiring only to scale it using the mean length.
 l. 9: « The results of this study help to understand how natural, or antrophogenic, perturbations propagate through the shelfsea ecosystem », it is difficult to agree with that last sentence since the results where not clearly stated before. After finishing reading paper, I also do not think the results help to understand how perturbations propagate in the ecosystem. The results rather offer a analysis framework to do that.
 l. 9: « antrophogenic » > anthropogenic
Introduction
l. 35: « an abstraction that will allow for smarter decisionmaking when considering data sampling and feature selection for ML. » Not that clear to me how and why abstraction can allow smarter decisionmaking.
l. 3750: Very nice paragraph clearly stating the objective of the work. Model and Data
Sec. 2.1: I think it will be nice to have a bit more details about the configuration. Things like: numerical schemes, diffusion, viscosity, equation of state, what forcings (wind, temperature?). How the simulations are run (spinup procedure, initialisation...). The reference to the papers should be for further details. The reader should not need to read these papers to get a basic understanding of the configuration.
Methodology
Sec. 3.1: maybe a figure showing the raw and filtered time series in the supplementaries could be useful to illustrate what are the timescale filtered? Or maybe some periodogram? It should probably be stated before (introduction? Or somewhere in the methods?) what are the timescales of interest? And why? Out of curiosity have you tried your analysis with the seasonal signal?
l. 154: « to a 21 km spatial resolution » make me wonder if the results are sensitive to the resolution of the model? Longer length scale because of eddy mixing? Or shorter one because of dynamical barrier created by filaments or eddies? This somehow questions also the isotropy assumption.
l. 162: I do not understand why the authors say : « As opposed to the biogeochemical lengthscales computed in Sect. 3.2.1 [...] here we manipulate the spatial networks to look at the spatial dependency of this length scale. » In section 3.2.1 you also have a map of the length scales that give you the spatial information (Fig. 2). I do not get the interest of these two definitions. Note that this also bring a bit of confusion about which are the length scales used for the different plots. For example in Fig. 4 which one is it? And for Fig. 5? I kind of got that Fig. 4 is the length scale define in sec. 3.2.1 and Fig. 5 the one in sec. 3.2.3 but it is not so clear.
l. 167: « black » rather than « red »?
Sec. 3.3: This part is not easy to follow. Maybe a short description of the objective at the beginning could help the reader. What are the objects to be clustered, following which criteria? If I understood well, the goal is to clusters gridpoints depending on their temporal correlation between each other for each variables so that gridpoints with strong correlation are group together.
Results and DiscussionSec. 4.1: As mentioned before, mentioning which length scale (the one from sec. 3.2.1 or sec. 3.2.3) the authors refer to would help the reader. Since two definition of length scale seems to be used, it feels natural to wonder how they compare?
l. 275278: I think I got the general idea here: the spatial distribution of the length scale of a specific variable is the product between Fig. 5a and Fig. 4. However, as it seems that it is not the same definition of the length scale between Fig. 4 and Fig. 5a it is a bit confusing.
Sec. 4.1: I am not familiar with length scale, but it seems that there is some literature on length scales (just saying that based on a quick search on google scholar). Some comparison of the results and the methods with the literature is missing there. Are there other definition of length scale? How does the method used in this paper compare with other? Are the length scales similar to former estimations?
Fig. 7: How is it done? I guess it is some kind of generalisation of Fig. 6 but it would be good to know more than « We used those robust boundaries to identify 13 regions representing areas of NWES connectivity. Results of this regionalisation are represented in Fig. 7. » (line 315)
l. 350: « or build simpler models than ERSEM » I think this need to be say a bit differently. Complexity of models tends to increase to better (or hoping to better) represent the real world. NPZD models already exist with just one phytoplankton, one zooplankton... Here the issue is to simplify ERSEM while keeping an accurate representation. Maybe something like line 51 « simplified (yet realistic with respect to the objectives) ».
l. 363366: I do not see that in Fig. 8. The mean correlation between POM (yellow) and the Higher Trophic Levels + DOM (pink) is rather low. The authors should clarify.
Conclusionsl. 410426: You are here a bit more specific about the results and this could be used for the abstract. E.g. « we can conclude that the biogeochemical lengthscales vary significantly between variables and are not directly transferable. » or « we have provided an approximation for the lengthscale of each variable, and each spatial location, that is informed by the high correlation in the spatial variability between lengthscales of each variable »...
l. 421424: « Our analysis demonstrated that the chemical components (e.g., nitrogen, carbon, silicon. . . etc) of each pelagic variable (e.g., diatoms, nanophytoplankton, microzooplankton) are closely linked and a simpler version of the model can be built, by reducing these variables through parametrization. » I do not know ERSEM but I assume that as many models it started from a simple version and the complexity has been increased (e.g. addition of more phytoplankton types). I am wondering how the grouping compare with a former simpler version of ERSEM? I suppose it should be relatively similar (e.g. all types of phytoplankton in gather in only one) however it will be quite interesting if some grouping where different.Extra comments
« lengthscales »: After a quick search on google scholar, it seems that it is rather written « length scales » or « lengthscales ».
The regions define in Fig. 7 could be used for sampling the domain to analyse the intervariable interaction network. Maybe selecting grid points only within one region and to compare with the same done with another region. Are the interaction between variables different between two regions? Or sampling evenly between the regions to have a fair general representation? This point is mostly for curiosity as it seems natural to try to use these regions.
l. 367: Butenschon et al. (2015) and Butenschon et al. (2016) are similar paper (2015 is the discussion version of 2016). Better to keep only 2016.

AC2: 'Reply on RC2', Ieuan Higgs, 21 Jul 2023
Thank you for taking the time and care to provide valuable feedback and contributions to this manuscript. Please see our responses to the comments in the attached PDF, which we are ready to implement for a future revision.
Best wishes,
Ieuan Higgs and the coauthors

AC2: 'Reply on RC2', Ieuan Higgs, 21 Jul 2023
Peer review completion
Journal article(s) based on this preprint
Viewed
HTML  XML  Total  Supplement  BibTeX  EndNote  

272  136  26  434  47  13  13 
 HTML: 272
 PDF: 136
 XML: 26
 Total: 434
 Supplement: 47
 BibTeX: 13
 EndNote: 13
Viewed (geographical distribution)
Country  #  Views  % 

Total:  0 
HTML:  0 
PDF:  0 
XML:  0 
 1
Jozef Skákala
Ross Bannister
Alberto Carrassi
Stefano Ciavatta
The requested preprint has a corresponding peerreviewed final revised paper. You are encouraged to refer to the final revised version.
 Preprint
(1797 KB)  Metadata XML

Supplement
(232 KB)  BibTeX
 EndNote
 Final revised paper