the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Predictions of satellite retrieval failures of air quality using machine learning
Abstract. The growing fleet of Earth Observation (EO) satellites is capturing unprecedented quantities of information about the concentration and distribution of trace gases in the Earth's atmosphere. Depending on the instrument and algorithm, the yield of good remote soundings can be a few percent owing to interferences such as clouds, non-linearities in the retrieval algorithm, and systematic errors in radiative transfer algorithm leading to inefficient use of computational resources. In this study, we investigate Machine Learning (ML) techniques to predict failures in the trace gas retrieval process based upon the input satellite radiances alone allowing for efficient production of good-quality data. We apply this technique to ozone and other retrievals using measurements from two sets of measurements: Suomi National Polar-Orbiting Partnership Cross-Track Infrared Sounder (Suomi NPP CrIS), and joint retrievals from Atmospheric Infrared Sounder (AIRS) – Ozone Monitoring Instrument (OMI). Retrievals are performed using the MUlti-SpEctra, MUlti-SpEcies, Multi-SEnsors (MUSES) algorithm. With this tool, we can identify 80 % of ozone retrieval failures using the MUSES algorithm, at a cost of 20 % false positives from CrIS. For AIRS-OMI, 98 % of ozone retrieval failures are identified, at a cost of 2 % false positives. The ML tool is simple to generate and takes < 0.1 s to assess each measured spectrum. The results suggest this tool can be applied to many EO satellites, and reduce the processing load for current and future instruments.
- Preprint
(54513 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-2392', Anonymous Referee #1, 30 Oct 2024
In their manuscript "Predictions of satellite retrieval failures of air quality using machine learning", the authors report on a study aiming at reducing the computational load of satellite data retrievals by identifying measurements for which the retrieval has a high probability of failing before even starting the retrieval. Their proposed method is based on a machine learning approach, trained on a set of satellite spectra and the corresponding error flags from the retrieval. The technique is demonstrated on MUSES retrievals of CO, temperature profiles and ozone using CrIS and AIRS/OMI radiances. The results show that a large fraction of unsuccessful retrievals can be avoided by applying the ML filtering, and that only a moderate number of successful retrievals is skipped.
The topic of this study is relevant as slow optimal estimation type algorithms can often not be applied to all measurements of modern satellite instruments due to a lack of computational resources. Reducing the number of unsuccessful retrievals, therefore helps in providing a larger number of results at the same cost. The proposed method is convincing, and the manuscript is clearly written and structured. However, I have several concerns and suggestions which should be considered before the manuscript can be accepted for publication.
Major comments
1) When using machine learning approaches in science and data retrieval, there is always the same concerns:
- Are all relevant situations covered appropriately in the training data set?
- Has the method generalised the information sufficiently to be applied to another data set?
- Did the algorithm learn the intended connections, or has it generalised correlations which exist by chance or are not cause-and-effect type links?
- Is a bias of some form introduced in the results when applying the method?
Some of these questions are discussed throughout the manuscript, for example, in the context of the erroneous flagging of high CO values. However, the manuscript would benefit from a specific section discussing the potential problems of the method and what the authors found in their tests.
2) As far as reported in the manuscript, the method was tested on a very limited data set. Surely, more MUSES retrievals are available to test fast ML filtering. Can a more robust test be performed using data from different seasons and different years?
3) The discussion of the results for the individual flags is interesting but confusing to me. I do not understand why a new metric (the Cramer's V metric) is introduced instead of simply using the number of successful predictions and the number of false positives as quality criteria as in other parts of the manuscript. Maybe I just did not understand what the authors tried to achieve, but I do not see the benefit of this discussion.
4) In several places, the authors try to use the results of the ML filtering to identify spectral regions linked to certain error flags. This makes sense for cloud-related flags, as different parts of the spectrum contain different kinds of cloud information, and the ML algorithm may identify them. However, the formulations used in the text are sometimes unfortunate; for example, "Further, the SW CrIS band … seems to have significant importance across most of the failure flags" suggests that a certain spectral region outside the fitting window is the source of a given retrieval problem, while in reality, one condition (such as broken clouds) can lead to effects in different regions. The ML filter does not necessarily hint at cause-and-effect relations but at correlations.
5) The OMI-AIRS ozone retrieval appears to be a very good example of the large benefits of ML-based data prefiltering. However, simple filtering using the OMI cloud product would be nearly as efficient in a real-world application without any additional machine learning effort. In general, filtering for known problematic or not interesting conditions could probably be a more transparent alternative to the ML filtering approach proposed here.
Detailed comments
- L7: duplication of "measurements"
- L12: applied to many EO satellites – applied to data from many EO satellites
- Introduction: I do not see why the data rate of GEO instruments should be higher than that of LEO instruments. In practice, this might be the case, but this is more linked to GEO instruments being rather recent additions with better detectors.
- Introduction: I think the main message of the authors is that more data is coming from the new generation of satellite instruments than can be analysed in NRT. For this simple statement, many references are used, which does not make sense to me. I suggest reducing them and focusing on those relevant to this study.
- Introduction: I also think that it should be mentioned that the problem addressed is mainly limited to Optimal Estimation type retrievals, while many other algorithms are fast enough to process the full volume of satellite data routinely.
- L67: “retrievals absorb” => “retrievals use absorption”
- Tables 1 & 2: I do not see the need for these tables
- Section 2.3.1: I think this can be shortened as it is not relevant to the manuscript
- Section 3.1.2: I'm not an ML expert, but I think it would be good to add a bit more information on the method of the "Extremely randomised trees" used here – are there no hyperparameters and other settings specific to the model you applied?
- L247: “only training is performed on” => “training is performed only on”
- L279: "These results suggest that non-fitted elements in the retrieval process have a significant impact on the overall quality of retrievals" – I'm not sure what the authors are trying to say and how this can be deduced from the fact that the ML algorithm is using information from outside of the fitting window to predict failure of the retrieval better. To me, this feels like a confusion of correlation and cause-and-effect relationship.
- L308: As mentioned above, the ozone failure flag is special as it is linked to cloud cover in a simple and easy-to-predict way.
- Section 4.3: I was surprised that the authors did not evaluate whether combining the prediction of individual flags would be better than training for the overall success flag.
- Section 5: References to Fig. 11 should probably be to Fig. 10. Figure 11 is not discussed at all, as far as I can see.
- Figure 10: Left column repeats the same figure three times, which I guess is a mistake.
- Figure 11: Something is not quite right here – the right figures' colour scale does not seem correct.
- Figure 11 caption: "from a day in 2020" – which day?
- L375: "do a good job in predicting the actual failures" – this is not clear from the current set of figures.
- L418 and elsewhere: I find the percentage speedups difficult to understand. What is a 100% speedup? At least to me, it would be easier to understand if the reduction in computational time is given.
- L439: Again, I'm confused by the speedup given. If 74% of the data is removed, I would either see a speedup by a factor of 4 or a reduction in computational time by 74%.
- Figure 15: it is clear from the figure that the filtering is mainly removing cloudy scenes and the right part of the OMI swath
- L459: "This cost/benefit can be improved…". This might be the case, but the authors have not shown any indication of that
- L460: "This work represents the first step in understanding why and how retrievals fail" – I disagree. This is not what this work is about. If you are interested in finding the reasons for failing retrievals, the detailed error information from the OE retrieval will be more helpful.
- L482: A different name is used here for your ML method than in the description in the text. Please make it consistent.
- L485: "which can be reduced..." – again, this has not been shown
- L486: "speedup of 66%" – I do not know how you compute these numbers. Only 67% of the original retrievals have to be performed, leading to a reduction of the computational time by 33%. The speedup would be by a factor of 1.47, but as discussed above, I think the computational time is much easier to understand.
- Appendix A and B: I do not think that this is needed or adds anything to the manuscript
Citation: https://doi.org/10.5194/egusphere-2024-2392-RC1 -
RC2: 'Comment on egusphere-2024-2392', Anonymous Referee #2, 05 Nov 2024
Predictions of satellite retrieval failures of air quality using machine learning
Malina et al.
Summary
This paper investigates the usefulness of machine learning to streamline data processing for incoming satellite retrievals. They highlight the need for improvements in the computational time to go from level-1 to level-2 data with the ever increasing amount of data being created on a daily basis. The principle they explore in this paper is to use machine learning to remove retrieval failures before the processing stage, therefore reducing the amount of data needing to undergo time-consuming processing. They found the an extremely randomized tree model was the best fit for the task and trained the model on CrIS and AIRS-OMI data for ozone, CO and temperature profile.
Their model performs reasonably well with a few caveats and at a high speed, showing how this could be applied to future EO missions and data processing.
Major Comments
This is a well written paper and thorough study that fills an obvious niche. I have a few comments below.
There is a lot of technical detail in the paper which makes its a long read. Most of it is needed but some sections (e.g. 2.3.1) could be shortened as they’re not as relevant to the study.
There is a little discussion about the cost/benefit at the end of the paper but there isn’t much information/calculations on the actual benefit in terms of computing speeds. It would be good to expand on this point as that is the primary motivation for the paper.
I would like to see an expansion in the discussion about what the next steps would be to improve the model and what might be considered a good enough model to be implemented.
Minor Comments
Line 28: is this not the case for all of the TROPOMI species, not just ozone?
Line 48-50: These sentences don’t scan well
Line 61: This doesn’t scan well, should the ‘allowing for multiple different products’ be in brackets?
Line 63: I think these two sentences should be joined.
Line 124 (and elsewhere): There are some inconsistencies between ‘L1b’ and ‘L1B’ throughout the text.
Line 230: Should this be “number of true positives”?
Figure 10: The figures on the left appear to be repeated instead of for each species.
Line 372: Wording doesn’t make sense. Should this be a comma instead of a full stop?
Section 5: There appears to be no reference to figure 10 although I think the text is actually meant to be referring to figure 10 but states figure 11. In which case there would be no reference to figure 11.
Citation: https://doi.org/10.5194/egusphere-2024-2392-RC2 -
RC3: 'Comment on egusphere-2024-2392', Anonymous Referee #3, 06 Nov 2024
This study employs machine learning (ML) to predict retrieval failures based on measured radiances from sensors (CrIS and AIRS+OMI). By using ML as a pre-processing tool, computational resources can be utilized more effectively, as retrieval algorithms are often computationally intensive due to the high precision and accuracy they require. The manuscript is well-written and provides a good analysis of the ML algorithm's ability to filter spectra based on measured radiances from various instruments.
I believe the study is suitable for publication after addressing the following minor points:
1. Line 200: Regarding PCA, how much of the variance is explained by the 30 components?
2. Section 4.2: When evaluating feature importance, the conclusion suggests that features outside a given window are as important, if not more so (depending on the retrieved species), as those within the window. If the master quality flag includes information from all windows, this would logically increase the importance of information outside the window. Additionally, the spectrum contains information about O₃, CO, and TATM outside their respective windows. Could the ML algorithm be sensitive to these regions as well? If so, could future work explore developing an ML algorithm to extract O₃, CO, and TATM directly from the spectra? This can be addressed as potential future work in section 6.
3. In Section 5, O₃ and CO are shown for different ML threshold values. This section could be strengthened by comparing the different filtering thresholds to a truth proxy. This would allow for a clear presentation of how the filtering threshold impacts bias and precision. The current quality filter would be a base line for comparison.
Citation: https://doi.org/10.5194/egusphere-2024-2392-RC3
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
323 | 59 | 103 | 485 | 4 | 4 |
- HTML: 323
- PDF: 59
- XML: 103
- Total: 485
- BibTeX: 4
- EndNote: 4
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1