A novel cluster-based learning scheme to design optimal networks for atmospheric greenhouse gas monitoring (CRO2A version 1.0)
Abstract. With the continued deployment of atmospheric greenhouse gas monitoring networks worldwide, optimal and strategic positioning of ground stations is essential to minimize network size while ensuring robust observation of fossil fuel emissions in large and diverse environments. In this study, a novel scheme (Concepteur de Réseaux Optimaux d’Observations Atmosphériques – CRO2A) is developed to design optimal mesoscale atmospheric greenhouse gas monitoring networks through a three-stage process of unsupervised clustering with inverse weighting and data processing. Unlike current approaches that rely primarily on inverse-modeling pseudo-data and heavily on error or uncertainty assumptions, this scheme requires no such assumptions; instead, it relies solely on direct atmospheric simulations of greenhouse gas concentrations. The CRO2A design scheme improves convergence to an optimal solution by minimizing the number of ground-based monitoring stations in the network while maximizing overall network performance. It can perform both foreground and background analyses and can assess and diagnose the quality of existing monitoring networks, among other special features. CRO2A treats simulated green- house gas concentration fields as spatiotemporal images, processed through multiple transformations, including data cleaning and automatic information extraction. These transformations reduce processing time and sensitivity to outliers and noise. The developed scheme incorporates techniques such as image processing and pattern recognition, supported by optimal heuristics derived from operations research, which enhance the ability to explore and exploit the problem search space during the solution process. Two applications are presented to illustrate the capabilities of the proposed optimal design scheme. These are based on simulations of atmospheric CO2 concentrations from the Weather Research and Forecasting (WRF) model-one for an urban setting and the other for a regional case in eastern France-used to evaluate optimal network designs and the computational performance of the scheme. The results demonstrate that the design scheme is competitive, straightforward, and capable of solving the design problem while maintaining a balanced computational cost. Based on the WRF reference simulation, CRO2A performed analyses of foreground measurements (atmospheric signatures of fossil fuel emissions) and their associated background fields (where simulated large-scale background concentrations are used, avoiding major sources and sinks of greenhouse gases), providing the minimum number of ground-based measurement stations and their optimal locations in the regions. As additional features, CRO2A enables users to diagnose the performance of any existing network and improve it in the event of future expansion plans. Furthermore, it can be used to design and deploy an optimal monitoring network based on predefined potential locations within the region under analysis.
Matajira-Rueda et al. present a novel approach to the optimization of new ground-based stations in a greenhouse gas observation network. Many previous approaches have relied on the inverse modelling methodology which is traditionally used to optimize for the flux estimates using concentration measurements from these ground-based network stations and prior information. This has computational challenges, as the optimization requires running components of the inversion which require extremely large datasets, and having to repeat this a large number of times in order to determine which set of stations achieves the best result with respect to some objective function, usually related to uncertainty reduction. The approach presented by the authors in this paper propose a machine-learning approach which is based on identification of clusters in the region, and then optimizing the location of sites which observe these clusters. Approaches are implemented to reduce the dimensionality of the data to improve on the computational time for running the algorithm, which therefore allows for more repeats of the process to be undertaken with different starting values to ensure that the optimal solution is achieved, rather than a local optimum.
The authors present the approach is a logical and clear manner, and clearly describe each step. The manuscript is easy to follow, even if no prior knowledge of inversions or machine learning. The figures and tables complement the explanation of the method and discussion of results.
I think that the manuscript is sufficient in it’s current form to present the proposed method and application.
I think it may be worth emphasizing that regardless of which method is used for optimizing the location of measurement stations, there is still a requirement for a thorough understanding of the transport model/models that will be used to generate the simulated concentrations, as locations where these models are known to perform poorly should be excluded from the search space. While the inverse modelling approach may not be used for determining the optimal network, the resulting network still needs to be compatible with the approach and take into account the challenges that need to be dealt with during the inversion procedure in order to achieve estimates of the posterior fluxes. For example, there needs to be an appreciation for the prior information that will be provided for the inversion, as the ultimate aim will be to ingest the concentration data from the observation network, together with the prior information, to provide estimates of fluxes. Locations that are heavily influenced by regions where the prior information is poor or highly uncertain can be problematic, as even if a new measurement station in that region contributes towards uncertainty reduction, the resulting posterior uncertainty is still very high, particularly if this is combined with error in the atmospheric transport model for that region. Approaches that use uncertainty reduction as the basis for objective function of the network design can penalize regions such as these by manipulating the uncertainty in these regions so that the optimization solutions with stations which see these locations do not overly dominate at the cost of seeing other regions which new stations can better contribute towards characterizing. Regions with high uncertainty are also those regions with high concentrations normally, so I think that both approaches would try to find solutions that view the same regions. The exception is CO2, as during periods when photosynthesis dominates, the concentrations in the surrounding regions influenced by air masses passing over these regions may have concentrations that are pulled lower, but actually the uncertainty in the models that describe photosynthesis can be very high, so if the objective was to improve on the prior fluxes for these regions, it would still be desirable to have stations that viewed these regions in the network. Therefore, there may need to be some adaptations to the method to account for large negative fluxes, or when regions have both large negative fluxes and anthropogenic fossil fuel contributions.
I’d certainly be interested to see how this method compares to the previous inverse modelling based approaches if both are provided with the same inputs.
Specific comments:
I think some clarifications in the caption would assist to allow the figures and tables be more stand-alone.
Figure 11, 13, 16: It's not clear what is the y-axis of the lower figure.
Figure 14, 17: the caption does not explain what’s in (d).
Table 2. It’s not clear from the title or row labels why there are 9 rows, or what the order signifies, if anything.