the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A novel cluster-based learning scheme to design optimal networks for atmospheric greenhouse gas monitoring (CRO2A version 1.0)
Abstract. With the continued deployment of atmospheric greenhouse gas monitoring networks worldwide, optimal and strategic positioning of ground stations is essential to minimize network size while ensuring robust observation of fossil fuel emissions in large and diverse environments. In this study, a novel scheme (Concepteur de Réseaux Optimaux d’Observations Atmosphériques – CRO2A) is developed to design optimal mesoscale atmospheric greenhouse gas monitoring networks through a three-stage process of unsupervised clustering with inverse weighting and data processing. Unlike current approaches that rely primarily on inverse-modeling pseudo-data and heavily on error or uncertainty assumptions, this scheme requires no such assumptions; instead, it relies solely on direct atmospheric simulations of greenhouse gas concentrations. The CRO2A design scheme improves convergence to an optimal solution by minimizing the number of ground-based monitoring stations in the network while maximizing overall network performance. It can perform both foreground and background analyses and can assess and diagnose the quality of existing monitoring networks, among other special features. CRO2A treats simulated green- house gas concentration fields as spatiotemporal images, processed through multiple transformations, including data cleaning and automatic information extraction. These transformations reduce processing time and sensitivity to outliers and noise. The developed scheme incorporates techniques such as image processing and pattern recognition, supported by optimal heuristics derived from operations research, which enhance the ability to explore and exploit the problem search space during the solution process. Two applications are presented to illustrate the capabilities of the proposed optimal design scheme. These are based on simulations of atmospheric CO2 concentrations from the Weather Research and Forecasting (WRF) model-one for an urban setting and the other for a regional case in eastern France-used to evaluate optimal network designs and the computational performance of the scheme. The results demonstrate that the design scheme is competitive, straightforward, and capable of solving the design problem while maintaining a balanced computational cost. Based on the WRF reference simulation, CRO2A performed analyses of foreground measurements (atmospheric signatures of fossil fuel emissions) and their associated background fields (where simulated large-scale background concentrations are used, avoiding major sources and sinks of greenhouse gases), providing the minimum number of ground-based measurement stations and their optimal locations in the regions. As additional features, CRO2A enables users to diagnose the performance of any existing network and improve it in the event of future expansion plans. Furthermore, it can be used to design and deploy an optimal monitoring network based on predefined potential locations within the region under analysis.
- Preprint
(3119 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-4112', Alecia Nickless, 08 Dec 2025
-
AC1: 'Reply on RC1', David Matajira-Rueda, 06 Feb 2026
The authors appreciate your time and willingness to review the proposed article. Following your comments and instructions, we have made the indicated and highlighted modifications in a new version of the document.
We also appreciate your comments and suggestions, which we value as they motivate us to continue exploring diverse application scenarios for CRO²A, as well as its possible future updates.
As you mention, most approaches focus on inverse modeling, and as is well known, its computational cost can be very high. Furthermore, it relies on information processed through Bayesian probabilistic assumptions. What we propose with CRO²A is the exploration of a different perspective from this traditional one, using a metric other than uncertainty reduction. The exploration is done automatically based on characteristics inherent to pattern recognition. The formulation of a different objective function allows both the exploration and exploitation of the solution space, in turn increasing convergence toward a global optimum.
We fully agree with you on the importance of the atmospheric transport model used and its appropriate parameter settings. The quality of CRO²A's results depends directly on the quality of the simulated data in the corresponding atmospheric transport model. Therefore, based on CRO²A's performance using data from other transport models (different from WRF and CAMS), we hope to enable their use in future versions to leverage the advantages of each.
Although we indirectly influence uncertainty, CRO²A focuses more on the trend of concentration behavior over time and space, taking advantage of automated analysis that, in turn, uses descriptive statistics to characterize them. The constraints applied to the data during processing are systematic, allowing the algorithm to learn from them without relying on assumptions that could bias the results.
We would like to highlight two of your comments, which precisely reveal some weaknesses in the inverse modeling design and definitely create an opportunity to test alternative solution strategies with CRO²A:
Firstly, regarding those locations heavily influenced by regions where prior information is deficient or highly uncertain, we believe that through a set of simulations using different flux fields as input, could be tested in a later version of CRO²A. Except that the transport model realizations could have a significant computational cost.
Secondly, regarding the penalty imposed on regions by approaches based on uncertainty reduction, we suggest a way to avoid this penalty by filling the "gaps" with the information provided by CRO²A, after using a set of realizations of the transport model (same flow field), so that transport errors are taken into account. The main limitation is the cost of running ensemble simulations (for both transport and emissions errors).
We also find your perspective on the inclusion of biogenic fields very interesting. We are well aware of this aspect and consider it undeniably necessary. Therefore, these fields have been under observation since the beginning of development, but the results are still being evaluated. The main difficulty lies in the "smoothness" or "flatness" of the biogenic fields. Our key to continuing research in this direction is the use of a complementary technique, which, at first glance, may be linked to a segmentation process based on the sets of species to be analyzed.
Finally, we would like to inform you that we already have comparative results with the Australian network presented in:
ZIEHN, T., et al. Greenhouse gas network design using backward Lagrangian particle dispersion modelling− Part 1: Methodology and Australian test case. Atmospheric Chemistry and Physics, 2014, vol. 14, no. 17, pp. 9363-9378.
These results have already been appended to this article (Lines 636-691). We are also conducting tests on the African network presented in:
NICKLESS, Alecia, et al. Greenhouse gas network design using backward Lagrangian particle dispersion modelling–Part 2: Sensitivity analyses and South African test case. Atmospheric Chemistry and Physics, 2015, vol. 15, no. 4, pp. 2051-2069.
However, these latter results will be published later.
About “specific comments”:
Figure 11, 13, 16: It's not clear what is the y-axis of the lower figure. Line 324-333
To clarify, we have added the following paragraph on the indicated lines:
Line 333-338
The logistic function representing the fitted model and its first two derivatives are used to calculate the optimal threshold (see the first and second sub-Fig. 11, respectively) according to the procedure proposed by McDowall and Dampney (2006), which makes use of intersections between certain straight lines (including the slope line at the midpoint of the logistic curve, obtained by means of the derivative in the second sub-Fig. 11) to calculate the baroreflex threshold and saturation points.
It should be noted that the performance represented is normalized; therefore, the vertical axes of this figure and its first two derivatives, shown in Fig. 11, are dimensionless.
Figure 14, 17: the caption does not explain what’s in (d).
The figure captions have been corrected as shown below:
Figure 14. Resulting optimal centroids (black triangles) and their corresponding clusters for all images in the dataset for both the foreground (a) and background (b) networks (urban level), location of the optimal centroids relative to the emission field (c), and according to the scoring matrix (d).
Figure 17. Resulting optimal centroids (black triangles) and their corresponding clusters for all images in the dataset for both the foreground (a) and background (b) networks (regional level), location of the optimal centroids relative to the emission field (c), and according to the scoring matrix (d).
Table 2. It’s not clear from the title or row labels why there are 9 rows, or what the order signifies, if anything.
The numbering in Table 1. and Table 2. has been removed as it had no relevant meaning other than to list the monitoring stations obtained.
Furthermore, the caption of both tables has been modified as follows:
“Table 1. Optimal results coordinates for urban-scale analysis according to Fig. 14. Two columns are presented: the first (Foreground) for the main monitoring network and the second for the background network. Both networks contain three ground monitoring stations, since they are designed as one-to-one networks and because 3 is the minimum value (threshold in Fig. 13) obtained from the analysis.”,
“Table 2. Optimal results coordinates for regional-scale analysis according to Fig. 17. Two columns are presented: the first (Foreground) for the main monitoring network and the second for the background network. Both networks contain nine ground monitoring stations, since they are designed as one-to-one networks and because 9 is the minimum value (threshold in Fig. 16) obtained from the analysis.”,
respectively.
Thank you in advance for your attention and collaboration.
Sincerely,
David Matajira-Rueda
Charbel Abdallah
Thomas Lauvaux
Following your observations, the authors have included some words or comments to improve the understanding of the document; therefore, below we list the lines in which such inclusions are found:
Line 16
Lines 113-114
Line 184
Line 189
Line 326
Lines 344-347
Lines 435-436
-
AC1: 'Reply on RC1', David Matajira-Rueda, 06 Feb 2026
-
RC2: 'Comment on egusphere-2025-4112', Anonymous Referee #2, 09 Jan 2026
The authors have presented an algorithm (CRO2A) for positioning an optimal CO₂ observing network for flux inversions, given a realistic modeled CO₂ field and various constraints such as the number of sites, locations of preexisting sites, and so on. The work itself is interesting and could present a faster alternative to traditional OSSEs that involve time- and resource-consuming flux inversions. My major concern about this work is the widespread conflation of mole fractions and surface fluxes throughout the work, and the implicit assumption that measuring where CO₂ is the highest gives us the most information about surface fluxes.
The point of a flux inversion is to infer surface fluxes – or more specifically, corrections to a first guess of surface fluxes – from observed atmospheric gradients of a species in space and time. A high CO₂ mole fraction by itself is not a signal of a large surface flux, nor are areas of high mole fraction always areas with the most significant fluxes, because the two are connected by atmospheric transport. For example, nighttime trapping of the respiration signal over a grassy area can create a near-surface CO₂ several hundred ppm above background, not because the fluxes are large but because the flux signal is trapped in a thin layer near the surface. Furthermore, high observed CO₂ does not necessarily provide a lot of information in a flux inversion unless it also disagrees with what we expect. Some of the highest CO₂ concentrations are right above power plant smokestacks, yet no one considers placing a CO₂ sensor there for a flux inversion because we *know* it’s going to be high, and if we have good statistics for that plant, we also know *how high* it’s going to be. A new monitoring location only provides information to a flux inversion if the observed CO₂ variations are *different* from what we expect a priori. Often the highest “bang for buck” for a new monitoring site is not where the CO₂ is highest, but where CO₂ variations are driven by fluxes from areas we’re interested in.
In the manuscript, the authors assume that areas of high CO₂ mole fraction are the ones that need to be observed for the best results in a flux inversion. This ignores both atmospheric transport and our knowledge of surface fluxes prior to doing an inversion. While the technique of deriving a set of locations given a proxy field seems sound in the manuscript, I do not agree with that proxy field being simulated CO₂ mole fractions. At the very least, the proxy field should be *changes* in the mole fraction in space and time, since fluxes lead to those changes. Even better, the proxy field should somehow incorporate the effect of transport and prior knowledge of surface fluxes. The OSSE studies referenced in the paper, although more expensive, are ultimately more useful because they do this implicitly, i.e., they tell us where to measure to have the most information about surface fluxes. Without accounting for transport and prior knowledge, I suspect the location map provided by CRO2A is of limited use.
I would like the authors to point out the fallacy in my thinking if I’m wrong in interpreting their work. If I’m not wrong, I’d like the authors to modify their method to account for transport, or at least delineate a method to design a monitoring network sensitive to the most significant *fluxes* as opposed to the highest *mole fractions*.
Apart from this major concern, I have the following minor concerns:
- It seems that the authors have only considered a simulated fossil CO₂ field. Even if the target was fossil CO₂ emissions, ignoring the confounding effect of the biosphere makes the problem simpler than in real life. I would like the authors to show the impact of a biospheric signal on their network design algorithm.
- Line 160: What is the baseline with respect to which storage requirements are reduced? Given a single field like CO₂ mole fractions, one would only *need* grayscale. So why start with RGB?
- Lines around 170: What does “magnitude” mean? Mole fraction of CO₂? How is it different from “intensity”?
- Line 203: Define “substantial”.
- Figures 11 and 13, line 316: What does “quality=1” or “performance=1” mean?
- Lines 385-387: So the point is to construct a tower network all inside Grand Est but sensitive to the Ruhr valley and Switzerland? Or is the domain of analysis all over Western Europe?
Citation: https://doi.org/10.5194/egusphere-2025-4112-RC2 -
AC2: 'Reply on RC2', David Matajira-Rueda, 06 Feb 2026
The authors appreciate your time and willingness to review the proposed article. Following your comments and instructions, we have made the indicated and highlighted modifications in a new version of the document. We also appreciate your comments and suggestions, which we value as they motivate us to continue exploring diverse application scenarios for CRO²A, as well as its possible future updates.
The authors have presented an algorithm (CRO2A) for positioning an optimal CO₂ observing network for flux inversions, given a realistic modeled CO₂ field and various constraints such as the number of sites, locations of preexisting sites, and so on. The work itself is interesting and could present a faster alternative to traditional OSSEs that involve time- and resource-consuming flux inversions. My major concern about this work is the widespread conflation of mole fractions and surface fluxes throughout the work, and the implicit assumption that measuring where CO₂ is the highest gives us the most information about surface fluxes.
The point of a flux inversion is to infer surface fluxes – or more specifically, corrections to a first guess of surface fluxes – from observed atmospheric gradients of a species in space and time. A high CO₂ mole fraction by itself is not a signal of a large surface flux, nor are areas of high mole fraction always areas with the most significant fluxes, because the two are connected by atmospheric transport.
As we are very aware of what an inversion does, we disagree with the statement. This point is of critical importance in our study, and has been an important topic of discussion before proposing the current framework.
The development of CRO²A does not assume that areas with a high CO₂ molar fraction are the ones that should be observed to obtain the best results in a flux inversion. If this were the case, very high individual and overall performances, close to 100% in all cases, would be expected, since the tendency would be to follow the highest molar fraction values. However, the information in Table 1 and 2, and especially subfigures (d) in Fig. 14 and 17, show that the optimal locations obtained do not occupy the positions of the highest values in the scoring matrix; they tend to be close to them, while maintaining the proportion between frequency of presence and intensity. It should also be noted that as soon as the binarization transformation is applied to the input dataset, the data of interest (i.e., as has been mentioned repeatedly, those with considerable and measurable values) become simply pixels with an intensity of 1, forming the solution space. Therefore, when the clustering process is applied, the automated system analyzes them as equal, without any distinction related to their CO2 concentration. As a result, candidate locations are obtained, which are optimized using the scoring matrix, which does contain information about the frequency of spatial presence for each proposed location.
For example, nighttime trapping of the respiration signal over a grassy area can create a near-surface CO₂ several hundred ppm above background, not because the fluxes are large but because the flux signal is trapped in a thin layer near the surface.
While this example is plausible, it assumes that nocturnal accumulation only affects the grassy area, which seems very unlikely. When considering a CO2 field at a given time step, CO2 mole fractions are strongly connected to CO2 fluxes, and nighttime accumulation will affect more than one area or one field.
The actual question relates to the ability of an inversion to assimilate atmospheric gradients and to adjust the surface fluxes accordingly.
Furthermore, high observed CO₂ does not necessarily provide a lot of information in a flux inversion unless it also disagrees with what we expect. Some of the highest CO₂ concentrations are right above power plant smokestacks, yet no one considers placing a CO₂ sensor there for a flux inversion because we *know* it’s going to be high, and if we have good statistics for that plant, we also know *how high* it’s going to be.
“because we *know* “ is exactly the kind of assumption we have avoided with CRO2A. Most (if not all) network design studies for CO2 inversions have made assumptions on prior flux errors (what is supposed to be well-known or not known), including a wide range of user-specific spatial error correlations, temporal error correlations, and even error variances.
Instead, CRO2A focuses on specific tracers that users can define as inputs of the optimization system. In parallel, other tracers can be considered as perturbations, hence limiting the detection of signals.
A new monitoring location only provides information to a flux inversion if the observed CO₂ variations are *different* from what we expect a priori. Often the highest “bang for buck” for a new monitoring site is not where the CO₂ is highest, but where CO₂ variations are driven by fluxes from areas we’re interested in.
This is exactly what is proposed by CRO2A. Users can freely determine which emission fields are being constrained by atmospheric data.
In the manuscript, the authors assume that areas of high CO₂ mole fraction are the ones that need to be observed for the best results in a flux inversion. This ignores both atmospheric transport and our knowledge of surface fluxes prior to doing an inversion.
We can only disagree with this statement. First, we do not ignore the transport nor the surface fluxes. And second, an inversion system absolutely requires a significant atmospheric signal to work with. If the fluxes do not produce any changes in the atmospheric concentrations, how is the inversion going to optimize the underlying fluxes? Assuming that without observing any changes in the CO2 concentration fields, such a data can lead to a robust inverse flux is not at all what the Bayesian inversion suggests.
While the technique of deriving a set of locations given a proxy field seems sound in the manuscript, I do not agree with that proxy field being simulated CO₂ mole fractions. At the very least, the proxy field should be *changes* in the mole fraction in space and time, since fluxes lead to those changes. Even better, the proxy field should somehow incorporate the effect of transport and prior knowledge of surface fluxes.
The OSSE studies referenced in the paper, although more expensive, are ultimately more useful because they do this implicitly, i.e., they tell us where to measure to have the most information about surface fluxes. Without accounting for transport and prior knowledge, I suspect the location map provided by CRO2A is of limited use.
As we mentioned in the discussions of the document: “The development of the optimal design scheme is based on two types of measurements (direct and indirect) and on the measuring instruments themselves. The variety of instruments used in greenhouse gas monitoring requires them to be immersed in the gas flow to characterize their location relative to the measured concentrations. For this reason, CRO²A seeks to identify locations for ground-based measurement stations where greenhouse gas fluxes with considerable and measurable intensities are expected to occur most frequently and for the longest period of time. This approach is consistent with that of Nalini et al. (2019), who prioritized the location of monitoring network stations over the magnitude of uncertainty reduction, since the former depends on previous and observational uncertainty values."
The CRO²A processing recognizes the importance of the signal-to-noise ratio of the simulated CO₂ molar fractions and, therefore, proposes locations where the signal is not highly sensitive to noise. In other words, it allows us to differentiate between noise and a signal of interest. The essence of CRO²A processing is the weighting metric based on the proposed scoring matrix, since, as mentioned, this matrix represents the spatial distribution of significant (considerable and measurable) concentrations, expressed as the frequency of occurrence at each location.
Just as image processing was one of the pillars of our development, so too was pattern recognition, which was used to characterize and extract information from the dataset simulated by the WRF-Chem model. This technique, supported by an automatic unsupervised learning system, allowed us, for example, to dispense with wind fields, since their information was already implicitly or indirectly contained in the spatiotemporal data used as input. In fact, since the input dataset contains spatial and temporal information, the characterization of the fields can be done appropriately, taking into account the changes (historical behavior) of the CO2 mole fractions for each point in the analysis region. The data processing remains statistical and probabilistic, similar to the inverse modeling approach, as the inversely weighted clustering techniques are based on such concepts.
Finally, since the WRF-Chem transport model (used to generate the CO2 simulations) takes into account prior knowledge of the analysis region, CRO²A, in turn, since it uses the outputs of the aforementioned model, is not unfamiliar with the atmospheric transport information inherited from WRF-Chem.
I would like the authors to point out the fallacy in my thinking if I’m wrong in interpreting their work. If I’m not wrong, I’d like the authors to modify their method to account for transport, or at least delineate a method to design a monitoring network sensitive to the most significant *fluxes* as opposed to the highest *mole fractions*.
Apart from this major concern, I have the following minor concerns:
- It seems that the authors have only considered a simulated fossil CO₂ field. Even if the target was fossil CO₂ emissions, ignoring the confounding effect of the biosphere makes the problem simpler than in real life. I would like the authors to show the impact of a biospheric signal on their network design algorithm.
- Line 160: What is the baseline with respect to which storage requirements are reduced? Given a single field like CO₂ mole fractions, one would only *need* grayscale. So why start with RGB?
- Lines around 170: What does “magnitude” mean? Mole fraction of CO₂? How is it different from “intensity”?
- Line 203: Define “substantial”.
- Figures 11 and 13, line 316: What does “quality=1” or “performance=1” mean?
- Lines 385-387: So the point is to construct a tower network all inside Grand Est but sensitive to the Ruhr valley and Switzerland? Or is the domain of analysis all over Western Europe?
Citation: https://doi.org/10.5194/egusphere-2025-4112-RC2
___________________________________________________________________
We also find your perspective on the inclusion of biogenic fields very interesting. We are well aware of this aspect and consider it undeniably necessary. Therefore, these fields have been under observation since the beginning of development, but the results are still being evaluated. The main difficulty lies in the "smoothness" or "flatness" of the biogenic fields. Our key to continuing research in this direction is the use of a complementary technique, which, at first glance, may be linked to a segmentation process based on the sets of species to be analyzed.
Finally, we would like to inform you that we already have comparative results with the Australian network presented in:
ZIEHN, T., et al. Greenhouse gas network design using backward Lagrangian particle dispersion modelling− Part 1: Methodology and Australian test case. Atmospheric Chemistry and Physics, 2014, vol. 14, no. 17, pp. 9363-9378.
These results have already been appended to this article (Lines 636-691). We are also conducting tests on the African network presented in:
NICKLESS, Alecia, et al. Greenhouse gas network design using backward Lagrangian particle dispersion modelling–Part 2: Sensitivity analyses and South African test case. Atmospheric Chemistry and Physics, 2015, vol. 15, no. 4, pp. 2051-2069.
However, these latter results will be published later.
About “Apart from this major concern, I have the following minor concerns”:
- It seems that the authors have only considered a simulated fossil CO₂ field. Even if the target was fossil CO₂ emissions, ignoring the confounding effect of the biosphere makes the problem simpler than in real life. I would like the authors to show the impact of a biospheric signal on their network design algorithm.
From the beginning of this research, both anthropogenic and biogenic fields were considered. In this first version of CRO²A, we focused on anthropogenic fields. Biogenic fields are still under study because, given their properties, complementary techniques are needed to reveal key information to characterize them. Therefore, the authors hope to include them in a later version of the optimal scheme.
Lines 131-135
Although both anthropogenic and biogenic fields have been considered in this development, the latter present a unique challenge due to their specific characteristics, requiring the use of techniques complementary to those described herein; this first version of CRO2A focuses solely on anthropogenic fields. However, given the undeniable need to include biogenic fields, it is expected that these will be incorporated in a subsequent version.
- Line 160: What is the baseline with respect to which storage requirements are reduced? Given a single field like CO₂ mole fractions, one would only *need* grayscale. So why start with RGB?
We are referring to an improvement in information storage, which is defined by the difference between an RGB image (data per pixel = 24 bits, memory usage: Height × Width × 3) and a grayscale image (data per pixel = 8 bits, memory usage: Height × Width). Typically, image processing is done using the most suitable color space for the application, and most techniques use all channels according to their proposed procedures. Indeed, given the input data (CO2 mole fraction field), selecting grayscale is sufficient and appropriate for our purposes, which is the case of CRO²A processing. It is also important to mention that the use of the RGB space is solely for visualization purposes (and is therefore optional for the user), not for processing. For this reason, the schematic flow chart in Fig. 1 makes no mention of the RGB color space in the general processing of CRO$^2$\!A.
- Lines around 170: What does “magnitude” mean? Mole fraction of CO₂? How is it different from “intensity”?
In some cases where we had (erroneously) used the term "magnitude", we were referring to the intensity at a point, that is, the value of the mole fraction of CO2 at that point. We appreciate your observation and have modified the term in the document to "values" on line 177 and to "intensities" on lines 180 and 191.
- Line 203: Define “substantial”.
The authors use the word "substantial" given its definition of "something of considerable importance, size, or value." What we are trying to express is that these mole fractions of the GHG are measurable and of considerable intensity, since they fall within the range of intensities to be analyzed. "Measurable" in turn defines those GHG intensities that can be measured with a specific instrument.
Given your observation, we have added the following annotation to the lines 209-210:
"(since they are of considerable importance, value, and therefore, measurable)"
- Figures 11 and 13, line 316: What does “quality=1” or “performance=1” mean?
The authors appreciate your observation, and since both terms have been used interchangeably, we have decided to use the term "performance" instead of "quality" throughout this document, as the former better represents what we wish to express.
Since the main metric is based on the scoring matrix, and in this matrix a value of 100% means that at that location a ground monitoring station will be in the middle of a signal of considerable value throughout the simulation time (depending on the dataset used), a performance score of 1 means that all ground monitoring stations are located at points where there are signals of considerable magnitude at all times. In other words, a performance score of 1 means that, according to the CRO2A optimality criteria, the obtained location cannot be improved.
To clarify, we have added the following paragraph on the indicated lines:
Line 339-343
Since the main metric is based on the scoring matrix, and in this matrix a value of 100% means that at that location a ground-based measurement station will be in the middle of a signal of considerable intensity for the entire simulation time (depending on the dataset used). Therefore, a performance score of one means that each and every ground-based measurement station is located at points where signals of considerable intensity are present at all times. In other words, a performance score of 1 means that, according to the optimality criteria of CRO2A, the location obtained cannot be improved.
- Lines 385-387: So the point is to construct a tower network all inside Grand Est but sensitive to the Ruhr valley and Switzerland? Or is the domain of analysis all over Western Europe?
The scope of analysis extends to the area shown, which includes those already listed in the article. To avoid confusion, the authors have removed the phrase "in northeastern France." (line 409) The two applications (urban and regional) aim to observe the performance of our proposal at different scales.
Thank you in advance for your attention and collaboration.
Sincerely,
David Matajira Rueda
Charbel Abdallah
Thomas Lauvaux
Additional:
Following your observations, the authors have included some words or comments to improve the understanding of the document; therefore, below we list the lines in which such inclusions are found:
Line 16
Lines 113-114
Line 184
Line 189
Line 326
Lines 344-347
Lines 435-436
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 280 | 68 | 32 | 380 | 24 | 24 |
- HTML: 280
- PDF: 68
- XML: 32
- Total: 380
- BibTeX: 24
- EndNote: 24
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Matajira-Rueda et al. present a novel approach to the optimization of new ground-based stations in a greenhouse gas observation network. Many previous approaches have relied on the inverse modelling methodology which is traditionally used to optimize for the flux estimates using concentration measurements from these ground-based network stations and prior information. This has computational challenges, as the optimization requires running components of the inversion which require extremely large datasets, and having to repeat this a large number of times in order to determine which set of stations achieves the best result with respect to some objective function, usually related to uncertainty reduction. The approach presented by the authors in this paper propose a machine-learning approach which is based on identification of clusters in the region, and then optimizing the location of sites which observe these clusters. Approaches are implemented to reduce the dimensionality of the data to improve on the computational time for running the algorithm, which therefore allows for more repeats of the process to be undertaken with different starting values to ensure that the optimal solution is achieved, rather than a local optimum.
The authors present the approach is a logical and clear manner, and clearly describe each step. The manuscript is easy to follow, even if no prior knowledge of inversions or machine learning. The figures and tables complement the explanation of the method and discussion of results.
I think that the manuscript is sufficient in it’s current form to present the proposed method and application.
I think it may be worth emphasizing that regardless of which method is used for optimizing the location of measurement stations, there is still a requirement for a thorough understanding of the transport model/models that will be used to generate the simulated concentrations, as locations where these models are known to perform poorly should be excluded from the search space. While the inverse modelling approach may not be used for determining the optimal network, the resulting network still needs to be compatible with the approach and take into account the challenges that need to be dealt with during the inversion procedure in order to achieve estimates of the posterior fluxes. For example, there needs to be an appreciation for the prior information that will be provided for the inversion, as the ultimate aim will be to ingest the concentration data from the observation network, together with the prior information, to provide estimates of fluxes. Locations that are heavily influenced by regions where the prior information is poor or highly uncertain can be problematic, as even if a new measurement station in that region contributes towards uncertainty reduction, the resulting posterior uncertainty is still very high, particularly if this is combined with error in the atmospheric transport model for that region. Approaches that use uncertainty reduction as the basis for objective function of the network design can penalize regions such as these by manipulating the uncertainty in these regions so that the optimization solutions with stations which see these locations do not overly dominate at the cost of seeing other regions which new stations can better contribute towards characterizing. Regions with high uncertainty are also those regions with high concentrations normally, so I think that both approaches would try to find solutions that view the same regions. The exception is CO2, as during periods when photosynthesis dominates, the concentrations in the surrounding regions influenced by air masses passing over these regions may have concentrations that are pulled lower, but actually the uncertainty in the models that describe photosynthesis can be very high, so if the objective was to improve on the prior fluxes for these regions, it would still be desirable to have stations that viewed these regions in the network. Therefore, there may need to be some adaptations to the method to account for large negative fluxes, or when regions have both large negative fluxes and anthropogenic fossil fuel contributions.
I’d certainly be interested to see how this method compares to the previous inverse modelling based approaches if both are provided with the same inputs.
Specific comments:
I think some clarifications in the caption would assist to allow the figures and tables be more stand-alone.
Figure 11, 13, 16: It's not clear what is the y-axis of the lower figure.
Figure 14, 17: the caption does not explain what’s in (d).
Table 2. It’s not clear from the title or row labels why there are 9 rows, or what the order signifies, if anything.