the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Hourly surface nitrogen dioxide retrieval from GEMS tropospheric vertical column densities: Benefit of using time-contiguous input features for machine learning models
Abstract. Launched in 2020, the Korean Geostationary Environmental Monitoring Spectrometer (GEMS) is the first geostationary satellite mission for observing trace gas concentrations in the Earth’s atmosphere. Observations are made over Asia. Geostationary orbits allow for hourly measurements, which leads to a much higher temporal resolution compared to daily measurements taken from low Earth orbits, such as by the TROPOspheric Monitoring Instrument (TROPOMI) or Ozone Monitoring Instrument (OMI). This work estimates the hourly concentration of surface NO2 from GEMS tropospheric NO2 vertical column densities (tropospheric NO2 VCDs) and additional meteorological features, which serve as inputs for Random Forests and linear regression models. With several measurements per day, not only the current observations but also those from previous hours can be used as inputs for the machine learning models. We demonstrate that using these time-contiguous inputs leads to reliable improvements regarding all considered performance measures, such as Pearson correlation or Mean Square Error. For Random Forests, the average performance gains are between 4.5 % and 7.5 %, depending on the performance measure. For linear regression models, average performance gains are between 7 % and 15 %. For performance evaluation, spatial cross validation with surface in-situ measurements is used to measure how well the trained models perform at locations where they have not received any training data. In other words, we inspect the models’ ability to generalize to unseen locations. Additionally, we investigate the influence of tropospheric NO2 VCDs on the performance. The region of our study is Korea.
- Preprint
(7213 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-3145', Anonymous Referee #1, 25 Nov 2024
General Comments:
This paper uses machine learning models and a network of surface NO2 monitors to derive surface NO2 over South Korea with time-resolved NO2 satellite columns from the GEMS instrument. This is the first study to examine the use of time-contiguous inputs (i.e., ones from previous hours) to derive surface NO2 using machine learning. The authors show performance gains of 4.5-15%, depending on the performance measure and the model. Geostationary measurements of trace gases only became available with the launch of GEMS in 2020, and these kind of studies are very interesting for looking at the benefits of using these new data sources.
I thought the paper was well-written and clear. My expertise is in remote sensing but not machine learning, yet I found the description of the method easy to follow and even learned a few things. The figures and results clearly indicate the use of earlier data improves the performance of these machine learning models overall. I would have liked to see a small discussion about how these improvements might change over the course of a day. The GEMS observations are probably much less accurate in the morning and late evening (high angles, less sensitivity to the surface). The morning is furthermore limited in earlier time-contiguous observations but the evening is not. How does the performance of the final result change as a function of time? Right now all the results at all times and locations are getting lumped in together.
Two rather basic models are used (linear regression and Random Forest), which the authors chose to more easily isolate the performance changes. It’s not clear how the performance with time-contiguous data would change in other model setups. Do you expect those to have the same gains in performance?
Overall, I thought it was a nice paper and recommend it be published after the authors address a few minor comments.
Specific Comments:
Line 44: Change to “the measurement of lower tropospheric gases is not accurate”
Line 44: “This is why most studies estimated daily” doesn’t follow from your previous statement. The estimate they give is still at a specific time, not a daily average which is implied here. Clarify this sentence.
Line 105: I’m confused… where did j come from? The above equation uses t-k+1 (no t-j mentioned.)
Line 148: Would be useful for context to summarize accuracy of the NO2 product you use, both for troposphere and stratosphere. And how does this change over a day?
Line 153: The TM5 model may leave residual structure in the results… maybe mention resolution here.
Line 175: What kind of sensors are used? What is accuracy of the sensors?
Line 183: “We assume” – this seems like something that should be clear in a user guide or the information could come from the data producers upon request. Is this a fact or are you really making an assumption? Without more information it could also be assumed that 1:00UTC is describing the monthly average from 00:30-1:30 UTC. I generally find the time stamp discussion confusing. Wouldn’t it make sense to label this example as 2021/01/23/02 since two datasets at least are occurring around 2:00 UTC?
Line 209: Maybe I don’t know enough about how these models work, but I don’t understand how these negative values can be excluded, or why they have to be. Can you give some more justification? If the model is trained on a dataset that is biased at low column values of NO2, how does this affect results? If you don’t care about the bias but can’t handle negatives, why not add a background amount to make all the negatives positive to maximize use of all data? If you want to use the column values later to estimate surface NO2 in a given location but have negative values and haven’t considered them in the model, how can these be used?
Table 1: I think it would be useful to re-define N and give its unit here in caption.
Line 314: I’m not really clear about why latitude should get included at all as a feature in the first place. It’s good to see later that its inclusion doesn’t matter much, as the tropospheric VCD should have very little dependence on latitude in a physical way. Presumably the correlation in Table B1 is moderately high because in Korea the NO2 sources are dominated by a few cities including Seoul in the North, but the latitude is not the cause of enhanced tropospheric NO2. It could be important for other gases and larger domains, but not trop NO2 in a tiny area like South Korea.
Figures 2 and 3: Not a big deal but I’m not sure why left column has to be included… seems redundant with middle column which provides a more complete result.
Line 653: Here and earlier, I’m not clear why you would want to use this model outside of Korea with no VCD input (also, the focus of the paper seems to be GEMS – i.e., satellite observations). Can you elaborate under what circumstances this would be useful? I would expect it to be pretty inaccurate without the VCD, especially in regions with no monitors, and not as useful as a physical model output from something like CAMS or GEOS-CF.
Technical Comments:
There are a few minor English issues that hopefully will mostly be resolved in copy editing. I have listed a few below (non-exhaustive).
Line 16/17: Phrasing is awkward and doesn’t make sense – remove “on the ones hand” and “on the other hand”. These usually are used to describes opposites, not just different topics.
Line 20: “In short” is awkward – remove.
Line 25: Change “derived” to “calculated”
Line 25: This sentence is not clear. Need to rephrase to something like “In their study, surface NO2 was estimated by applying an assumed NO2 vertical distribution derived by a chemical transport model to tropospheric NO2 vertical column densities (tropospheric NO2 VCDs) to determine surface concentrations, using VCDs measurements from the Ozone Monitoring Instrument (OMI, Levelt et al. (2006))”.
Line 31: Change “or” to “and”
Line 33: Needs a verb, i.e., “in determining surface NO2”
Line 42: “Pass over”
Line 52: Confusing phrasing “over around 20 countries”. Just say “20 countries”
Line 69: Sentence does not have a verb.
Line 140: “up to ten observations over a given location according to the season”
Line 153: Usually it’s an “air mass” not “airmass”
Line 283: Change “In all what follows” to “In all that follows”
Figure 2 and 3: In these and other figures, the linewidth, symbol size and sometimes font size are very small and hard to read on my screen. There are not many points, so there is a lot of room to improve the figures by making lines and symbols larger in future plots.
Citation: https://doi.org/10.5194/egusphere-2024-3145-RC1 -
RC2: 'Comment on egusphere-2024-3145', Anonymous Referee #2, 03 Dec 2024
The paper explores the use of hourly observations from the Korean Geostationary Environmental Monitoring Spectrometer (GEMS), the first geostationary satellite for monitoring trace gases over Asia, to estimate surface NO2 concentrations. The authors leverage GEMS's high temporal resolution, which allows for hourly measurements, and combine these with meteorological data as inputs for Random Forests and linear regression models. A key innovation is the use of time-contiguous data, incorporating both current and prior hours' measurements, which enhances model performance.
The study evaluates the models using spatial cross-validation with in-situ surface measurements, assessing their ability to generalize to unseen locations. Results indicate that including previous observations improves performance, with Random Forest models achieving a 4.5% to 7.5% gain and linear regression models a 7% to 15% gain across various metrics. The research focuses on Korea and highlights the critical role of GEMS tropospheric NO2 vertical column densities (VCDs) in driving model accuracy.
This work demonstrates the potential of geostationary satellite data for surface air quality assessment and underscores the advantages of incorporating temporal data in machine learning models.
The manuscript addresses an important topic and demonstrates innovative use of geostationary satellite data for estimating surface NO2 concentrations. However, substantial revisions are needed to clarify data processing, justify model choices, address potential biases, and strengthen sensitivity analyses. Providing additional validation and comparison with advanced methods could significantly enhance the robustness and impact of this work.
General Comments
- Data Processing
The data processing methodology is unclear and lacks sufficient detail. The authors should enhance this section by:
- Including a flowchart in the Data section to visually illustrate the entire data processing workflow.
- Providing a table detailing each data source, including the spatial and temporal resolution, and any preprocessing steps applied to the input datasets.
- Satellite and Ground Station Pairing
GEMS data has a relatively coarse spatial resolution (~8x8 km) compared to in-situ ground measurements. The manuscript does not clearly explain the methodology for pairing satellite pixels with ground stations. Specifically:
- The statement “we associated the location of an in situ station with the VCD pixel or meteorological pixel whose center is nearest to the station’s location” needs clarification. Does this refer to the center of the satellite pixel?
- If multiple ground stations fall within the same satellite pixel, how are these handled? Are they averaged, or is one selected?
- GEMS pixel locations vary slightly with each scan due to orbital and observation geometry. Did the authors regrid the satellite data before co-location to ensure consistency?
- Characteristics of Ground Stations
- What type of instruments are used at the ground stations? For example, are they chemiluminescent analyzers?
- Ground stations are often categorized as urban, background, or roadside. Did the authors use all station types, or restrict their analysis to specific types? The representativeness of the training data depends on this choice.
- Temporal Input and Data Loss
The use of prior hourly data as inputs raises some concerns:
- The model will not produce predictions for the first few hours of each day, creating data gaps.
- Cloud cover and other issues affecting satellite measurements in prior hours can propagate errors into the current hour’s input, resulting in significant data losses during training and prediction. This cascading loss reduces the dataset from over 1.3 million data points for a 1-hour input window to approximately 350,000 for a 5-hour window. The authors should provide a clear justification for accepting this trade-off between increased data gaps and potential gains in model accuracy.
- Additionally, the authors should evaluate and discuss how these data gaps impact not only the training and validation phases but also the model's predictions and its applicability to real-world scenarios. This includes addressing potential limitations in the model’s ability to generalize when encountering similar conditions in operational or extended applications.
- Justification for Input Variables and Preprocessing
- The paper lacks justification for the selection of input variables. A sensitivity analysis or variance inflation factor (VIF) analysis should be conducted to ensure the chosen variables are non-redundant and significant.
- The input variables differ in units and magnitudes, which could cause instability in model performance. Did the authors scale, normalize, or log-transform these variables before training? This critical preprocessing step is missing from the discussion.
- Choice of Models
- The authors used Random Forest and linear regression but did not justify these choices.
- More advanced machine learning methods, such as Artificial Neural Networks (ANN), Recurrent Neural Networks (RNN), or Convolutional Neural Networks (CNN), have been shown to better handle non-linear relationships and spatio-temporal dependencies in atmospheric data. The authors should explain why these advanced methods were not used or compare their results to them.
- Handling of Negative Values
- The authors ignored negative GEMS VCD values, which will bias the average toward positive values. Justification is needed for this choice.
- Similarly, were there negative values in the in-situ measurements? If so, how were these handled? This needs to be explicitly discussed.
- QA Value Threshold and Bias
- The authors only used data with QA values equal to 1. This choice filters out cloudy conditions but potentially introduces a clear-sky bias since cloudy conditions can be associated with higher aerosol or NO2 levels. The authors should address this limitation and quantify its impact on results.
- Inclusion of Latitude
Including latitude as an input variable needs further justification, as the latitudinal variation over South Korea is minimal. The authors should explain the rationale behind this decision.
Section-Specific Comments
Section 5.2:
- The atmospheric lifetime of NO2 varies with season and time of day, and this variability likely influences model sensitivity. The authors should:
- Conduct and present seasonal and diurnal sensitivity analyses to account for these variations.
- Address potential biases from the limited temporal scope of training data (January 2021 to November 2022). For instance, why was data from December underrepresented, and why were only 23 months used instead of two full years?
- Discuss whether differences in valid data points across seasons (e.g., more data in summer due to fewer clouds) lead to seasonal biases in model training.
Section 5.3:
- The prediction maps show that the model has been applied beyond South Korea, including regions over the ocean, Japan, and North Korea. The authors should:
- Validate the model's performance in these regions by comparing predictions to in-situ measurements from other countries, such as Japan. This would demonstrate the model's transferability across different geographies.
- The prediction maps also exhibit noticeable grid structures, likely originating from the meteorological ERA5 dataset. Did the authors interpolate the ERA5 data to reduce these artifacts? If not, why?
- Clarify how gaps in GEMS data (e.g., due to cloud cover) were handled during prediction. The maps show no missing areas (Figure 9 and 10), suggesting the model was applied to cloudy data despite such data being excluded during training. Discuss the implications of using potentially contaminated data and its impact on model accuracy.
Citation: https://doi.org/10.5194/egusphere-2024-3145-RC2 - Data Processing
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
143 | 51 | 8 | 202 | 5 | 4 |
- HTML: 143
- PDF: 51
- XML: 8
- Total: 202
- BibTeX: 5
- EndNote: 4
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1