the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
NitroNet – A deep-learning NO2 profile retrieval prototype for the TROPOMI satellite instrument
Abstract. We introduce "NitroNet", a deep learning model for the prediction of tropospheric NO2 profiles from satellite column measurements. NitroNet is a neural network, which was trained on synthetic NO2 profiles from the regional chemistry and transport model WRF-Chem, operated on a European domain for the month of May 2019. This WRF-Chem simulation was constrained by in-situ and satellite measurements, which were used to optimize important simulation parameters (e.g. the boundary layer scheme). The NitroNet model receives vertical NO2 column densities (VCDs) from the TROPOMI satellite instrument and ancillary variables (meteorology, emissions, etc.) as input, from which it reproduces NO2 concentration profiles. Training of the neural network is conducted on a filtered dataset, meaning that NO2 profiles with strong disagreement (> 20 %) to colocated TROPOMI column measurements are discarded.
We present a first evaluation of NitroNet on a variety of geographical domains (Europe, US west coast, India, and China) and different seasons. For this purpose, we validate the NO2 profiles predicted by NitroNet against monthly-mean satellite, in-situ, and MAX-DOAS measurements. The training data were previously validated against the same datasets. During summertime, NitroNet shows small biases and strong correlations to all three datasets (bias = +6.7 % and R = 0.95 for TROPOMI NO2 VCDs, bias = −10.5 % and R = 0.75 for AirBase surface concentrations). In the comparison to TROPOMI satellite data, NitroNet even shows significantly lower errors and stronger correlation than a direct comparison with WRF-Chem numerical results. During wintertime considerable low biases arise, because the summertime training data is not fully representative of all atmospheric wintertime characteristics (e.g. longer NO2 lifetimes). Nonetheless, the wintertime performance of NitroNet is surprisingly good, and comparable to that of classic RCT models. NitroNet can demonstrably be used outside the geographic domain of the training data with only slight performance reductions. What makes NitroNet unique compared to similar existing deep learning models is the inclusion of synthetic model data, which has important benefits: Due to the lack of NO2 profile measurements, empirical models are limited to the prediction of surface concentrations learned from in-situ measurements. NitroNet, however, can predict full tropospheric NO2 profiles. Furthermore, in-situ measurements of NO2 are known to suffer from biases, often larger than +20 %, due to cross sensitivities to photooxidants, which empirical models inevitably reproduce.
- Preprint
(25936 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Review of "NitroNet – A deep-learning NO2 profile retrieval prototype for the TROPOMI satellite instrument"', Anonymous Referee #1, 18 Jun 2024
The manuscript entitled "NitroNet - A deep-learning NO2 profile retrieval prototype for the TROPOMI satellite instrument" by Kuhn et al. presents a deep-learning model for NO2 profile retrieval. This work is innovative and valuable as it exploits the power of Machine Learning (ML) to capture the 3D distribution of NO2 which is typically inferred by the Chemical Transport Model (CTM). Meanwhile, given that previous ML models trained on ground measurements focus heavily on surface NO2 mapping, this work demonstrates the feasibility of using the synthetic data as a training target to extend the model's predictive ability above the surface layer. In general, this work is commendable and extends the application of ML in the atmospheric sciences.
The manuscript is well-organized and informative, providing a relatively clear description of the development and implementation of NitroNet. However, upon closer inspection, some concerns need to be addressed before the publication.
General comments:
- Line 102: Using only one month as training data cannot enable the model to learn seasonal variability, and the authors also acknowledge this limitation in lines 389-390. It is suggested to clarify the reason for choosing only one month in the input data preparation sections, which would be useful for the readers.
- Section 2.1: As the NO2 is mainly present at the near surface and much less in the upper layers of the atmosphere, the large difference in the magnitude of the NO2 between the layers can lead to a skewed distribution of the targets in the training data set. In this work, a feature transformation was applied to the input variables, but it is not clear whether the transformation was also applied to the target variable. If not, would this affect the predictive ability of the model in the higher layers? It would be beneficial to clarify this. It is also suggested to clarify the data splitting strategy (e.g. sample-based, space-based, or period-based).
- It is suggested that section 3.1 be merged with section 2.2 as both sections describe the model input.
- Section 3.3: It is necessary to explain more about the filtering strategy. The filtering takes the TROPOMI data and the ERA5 PBLH data as reference data, but their uncertainty should also be acknowledged here. Meanwhile, it would be beneficial to show the spatial distribution of the filtered training samples to check if there are still enough samples left for different grids within the study area. For example, the number or proportion of training samples left for a grid.
- Lines 247, 252: It's questionable whether the validation is fair here, as the test dataset is also filtered. The validation and test data sets are very important for examining the generalization performance of the model and should represent the unseen scenario to which the NitroNet is applied. The validation data is used to optimize the model and the test data is used as a final check. Considering that the model is not only used within the training month (i.e., May 2019), the filtering based on WRF results is not available when the model is applied to a different area and period. The current test result is likely to give an overly optimistic assessment of the model generalization. Therefore, it is suggested to complement the comparison based on the unfiltered test dataset.
- Lines 335-336: Omitting the urban station from the validation does not seem to be a good choice. The dynamic of NO2 is closely related to human activity, while the measurements outside the urban stations mainly provide low and relatively stable NO2 Comparing the model results with these non-urban stations, it is difficult to evaluate the ability of NitroNet to capture the surface NO2 dynamics.
- Lines 425-450 and Figure 13: This part of the study examines the seasonal performance of the NitroNet on a daily and monthly basis, with a particular focus on the monthly-mean results. Figure 13 illustrates that while both results exhibit comparable trends, the monthly-mean results are superior to the daily-mean results. Furthermore, the monthly-mean results are not simply the average of the daily-mean results. Considering that the NitroNet provides hourly outputs and the TROPOMI overpasses once a day, it is more realistic to report the daily means in practice. The use of monthly means may result in overly optimistic assessments, and authors also recognize that the use of monthly means can reduce statistical noise, as stated in line 449. It can be understood that the significant data gap interferes with the daily-mean results as stated in lines 446-449. Consequently, it is recommended that the authors aggregate all the daily means and then calculate the overall RMSE, bias, and Pearson correlation as a monthly evaluation.
- Section 2, conclusions, discussion, and outlook: Have the authors considered using surface NO2 from in-situ measurements as another target for model training? It is not necessary to emphasize the uniqueness of NitroNet over previous models by training it only on NO2 There are hundreds (even thousands) of stations measuring surface NO2 which is the dominant fraction of total NO2. Thus, using measured surface NO2 as an additional target could enrich the constraints for model training and improve model agreement with surface measurements. Although the authors point out the uncertainty inherent in the in-situ measurements several times, the uncertainty in the synthetic data and the remote sensing data cannot be ignored either. Therefore, it is better not to underestimate "training on ground measurements". The prospects of incorporating ground data into NitroNet training should be discussed.
Minor comments:
- Line 14: Here the authors mention three datasets, so the statistics from MAX-DOAS validation should also be mentioned.
- Line 16 “summertime”: Maybe the “late spring” would be more appropriate.
- Lines 54-56: The typical uncertainty of MAX-DOAS is suggested to be mentioned.
- Lines 85-92: The statement of the benefits of the NitroNet approach could be further refined. Here are two points the authors might refer to:
- Leveraging the synthetic data can overcome the limitations of insufficient measurements and enable the ML model to perform the prediction task for a larger space with more dimensions.
- In addition to providing a substantial number of training samples, the synthetic dataset allows the model to learn a more general and physically plausible pattern of NO2, which could enhance the model's generalization performance.
- Lines 120-122: Although QA filtering is a common operation for TROPOMI data processing, it may result in significant missing data and limit the generalization of the ML model. As shown in Figure C4, such filtering may result in few samples over NO2 hot spots for model training. Moreover, does it mean the NitroNet can only be used when TROPOMI QA is greater than 0.75? A related discussion is suggested.
- Lines 124-125: ERA5 hourly reanalysis data resolution should be 0.25 rather than 0.125. The ERA5-land hourly has the resolution 0.1 but seems not used in this work. Meanwhile, the year of reference provided seems to be 2016 instead of 2017 (http://www.ecmwf.int/en/newsletter/147/news/era5-reanalysis-production).
- Lines 144 and 194: As this work mentions the Monte Carlo (MC) method is used to address the uncertainty, is the MC method the MC Dropout? Can the authors state how many times the predictions are called when using this method? Is the model output the average of multiple MC runs? As information on the prediction uncertainty for the ML model is essential for model reliability, I have not found the uncertainty reported by MC in this manuscript, can the authors complement it?
- Lines 103, 148: How does the “43 terrain-following pressure levels” enable the NitroNet model to output “186 levels”?
- Line 187: Can the authors state the total number of parameters or trainable parameters in NitroNet?
- Section 3.4: Will the out-of-distribution treatment be applied to the model application process or just to the training process? The marginal probability density distributions pxi(x) are calculated on the filtered training set which has only 7% data remaining. If the OOD treatment is also applied to the application process, will many instances be treated as OOD?
- Lines 238-240: As the NitroNet only has one output neuron for the NO2 concentration, how can an additional output for the ratio F be generated?
- Lines 318-320: It is suggested to add a validation experiment for 2019 only on the still valid stations in 2022. Considering the significantly reduced number of valid stations in Italy, the difference in the statistics shown in the manuscript makes less sense.
- Figure 8: Please clarify based on which temporal scale (every scatter point or the monthly-mean) these statistics are calculated.
- Figure 9: “within a radius of 5 km were drawn at 0 m altitude” needs a reference.
- Lines 507-509: The transformer model could be also considered (See https://www.microsoft.com/en-us/research/blog/introducing-aurora-the-first-large-scale-foundation-model-of-the-atmosphere/).
Technical corrections:
- Line 18: “geographic domain” should be modified as “geographic and temporal domain”.
- Please pay attention to writing out acronyms at their first occurrence (e.g., TROPOMI, LIDAR, CAMS, etc).
- Line 28 “10 ug/m3”: Reference is needed.
- Line 44 “to be the main cause”: “to be one of the main causes”.
- Line 69 “CAMS”: “CAMS regional”.
Citation: https://doi.org/10.5194/egusphere-2024-1196-RC1 - AC1: 'Reply on RC1', Leon Kuhn, 22 Jul 2024
-
RC2: 'Comment on egusphere-2024-1196', Anonymous Referee #2, 25 Jun 2024
Review of Kuhn et al., “NitroNet- A deep-learning NO2 profile retrieval prototype for the TROPOMI satellite instrument”
Reviewer suggestion: minor revisions.
This paper presents a new NO2 retrieval model to produce vertical profiles from satellite observations, using a machine learning approach. In my opinion, this is an impressive piece of work, thoroughly explained, well executed, and producing impressive results. The results of NitroNet comparisons to vertical columns and surface values, within and outside of the training times/regions, shows very good promise.
I think that the main weakness of the paper lies in the challenge of verifying the NO2 vertical profiles, not just columns and surface values. This is inherent to the point of the paper of course, i.e., that NO2 vertical profile measurements are sparse. The authors tackle this by comparison to the FRM4DOAS MAX-DOAS dataset, and results are promising. I think it would improve the paper to include comparison to more MAX-DOAS datasets if possible, outside the European domain and over more seasons. Perhaps this could be achieved by looking at a few discrete layers in the profile, not necessarily full profile comparison plots. I also think the authors should consider whether verification against cloud-sliced NO2 data, or aircraft campaign NOx measurements, are an option to demonstrate the capability of NitroNet to provide information on free- and upper-tropospheric NO2 tropospheric profile.
I have listed some specific minor revisions below.
Introduction:
- It is worth mentioning that there are methods of determining some vertically-resolved NO2 information from satellite observations, e.g. cloud-slicing, and also there are aircraft campaigns providing vertically-resolved NOx information.
- You mention that TROPOMI NO2 relies on a priori profiles, but it is also worth noting in your initial comments that the same is true for MAX-DOAS NO2 vertical profiles.
Line 89: ‘cannot’ rather than ‘can not’, and later in the sentence I think you mean ‘inherent to the training data’ not ‘immanent…’
Line 110: it would be helpful to the reader to include a brief comment on why the O3 VCDs are included in NitroNet.
Line 137: MAX-DOAS measurements are strongly influenced by clouds. You mention the filtering of clouds by virtue of the selected TROPOMI QA flag: is there a similar filtering for cloudy results for FRM4DOAS MAX-DOAS results?
Line 176: A reference for Shapley scores would be good here.
Line 191: This statement is a little unclear to me: ‘The learning rate was halved whenever training progress had stalled over several epochs’. Perhaps you could clarify?
Line 211: Is the low bias you mentioned improved or worsened if the filtering criteria are relaxed from the tuned DVCD and DPBLH?
Line 251: high NO2 in the upper troposphere is also linked to long lifetime of NOx reservoirs, lightning and subsidence from the stratosphere.
Line 254: Could you provide a brief comment on why the model performs better at high NO2 concentration than low? Is this largely due to the better agreement in the lower troposphere/more polluted layers?
Figure 5: I presume that the WRF-Chem comparison to Airbase is achieved with in-situ bias correction (F factor) calculated by WRF-Chem, and that the NitroNet comparison is achieved with F calculated by NitroNet? How well do the F factors agree between WRF-Chem and NitroNet? Could any discrepancies in F factor help explain any of the observed in-situ NO2 biases in Fig 5?
Figure 9: Is it possible to show the standard deviation of the mean monthly profiles for each technique? It would be interesting to know how significant the profile differences are given the in relation to the variability across the month. Just to clarify, have you only taken MAX-DOAS profiles from FRM4DOAS at the TROPOMI overpass time?
Line 375: You say in relation to profiles with elevated layers of NO2 that ‘NitroNet is unable to reproduce this profile type, most likely because the training dataset contains very few corresponding examples’. Is this something that can be rectified? In principal, or even better if you’re able to show it, is it possible to provide more elevated layer examples in the synthetic training data to address this problem?
Line 400-401: There are a number of outstanding research questions related to NOx over the oceans, for example the contribution of ship emissions in the lower troposphere, and the role of lightning in upper tropospheric NOx over the ocean. Is your hypothesis here that NitroNet performs worse over the oceans because the model gets ship NOx emissions wrong, biasing your training set? Rather than state that the oceanic regions are less relevant, it would be good to understand your thoughts on how NitroNet could be improved over the oceans.
Line 425: In terms of seasonal performance of the vertical profiling capability, it would be really valuable to assess NitroNet against the FRM4DOAS network over seasonal timescales. Seasonal comparison at a few specific altitudes, e.g. 0, 1, 3 km, would give an indicator of whether NitroNet consistently achieves its aim of providing NO2 vertical profiles.
Figure 13: I may be missing something here, but I’m unsure how the monthly mean correlation coefficients can be almost all above the daily mean correlation coefficients, and the monthly mean RMSE can often be below all the daily RMSE values for a given month (e.g. Apr-Jul 2022)?
Citation: https://doi.org/10.5194/egusphere-2024-1196-RC2 - AC2: 'Reply on RC2', Leon Kuhn, 22 Jul 2024
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
326 | 93 | 27 | 446 | 15 | 20 |
- HTML: 326
- PDF: 93
- XML: 27
- Total: 446
- BibTeX: 15
- EndNote: 20
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1