the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Ground-Based Validation of Sentinel-5P TROPOMI Atmospheric Products using Calibration-Informed Low-Cost Multi-Spectral Sensors
Abstract. Ground-based validation of satellite atmospheric products is essential for ensuring data quality and algorithm performance. We present a validation approach for Sentinel-5P TROPOspheric Monitoring Instrument (TROPOMI) cloud fraction products using a multi-spectral ground station (DG2MCM-15) located in Kempten, Bavaria, Germany. The ground observatory combines professional metrological experience from ISO/IEC 17025 accredited laboratory environments with low-cost commercial sensors, creating a citizen science validation capability.
Our validation dataset comprises 276 temporally matched observations between Sentinel-5P overpasses and ground measurements over a four-week period (January 11 – February 8, 2026). Ground-based cloud detection using an MLX90614 infrared pyrometer achieves strong agreement with Sentinel-5P cloud fraction retrievals (Pearson R = 0.879, N = 27 after quality filtering). The root mean square error of 29.1 % cloud fraction reflects a systematic positive bias from spatial scale mismatch between the ground sensor field of view and satellite pixel dimensions. The method reliably distinguishes between clear, partially cloudy, and overcast conditions, though the derived cloud fraction values exhibit clustering due to the temperature-ratio approach used. Exploratory comparison with TROPOMI aerosol index products yielded negligible correlation due to the absence of UV spectral coverage in the ground sensor, identifying a clear instrumentation requirement for future aerosol validation work.
Temporal matching between satellite overpasses and ground observations achieved a mean time difference of 2.7 minutes, with 95 % of matches within 8 minutes of satellite observation time. Spatial co-location analysis confirms all validation points fall within the nominal TROPOMI pixel footprint (3.5 km × 5.5 km at nadir), though the spatial scale mismatch between the ground sensor field of view and satellite pixel dimensions remains the primary source of validation uncertainty.
Our results demonstrate that low-cost infrared sensors, when operated with calibration-informed measurement protocols, can provide scientifically useful satellite cloud product screening data, reliably distinguishing between clear, partially cloudy, and overcast conditions. The quasi-discrete nature of the derived cloud fraction highlights the need for improved cloud detection algorithms in future work. This approach offers a scalable pathway for expanding ground-based validation networks in regions lacking dedicated atmospheric monitoring infrastructure.
- Preprint
(1281 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2026-817', Anonymous Referee #1, 27 Feb 2026
- AC1: 'Reply on RC1', Wolfgang Schneider, 28 Feb 2026
-
RC2: 'Comment on egusphere-2026-817', Anonymous Referee #2, 21 Mar 2026
General comments:
The manuscript "Ground-Based Validation of Sentinel-5P TROPOMI Atmospheric Products using Calibration-Informed Low-Cost Multi-Spectral Sensors" by Schneider presents a proof of concept to measure cloud coverage with low-cost ground-based sensors and evaluates the approach with comparisons to the cloud product from the polar orbiting TROPOMI instrument onboard the Copernicus Sentinel-5P satellite.
The paper is overall well written and the scientific topic investigated is justified and matches well with AMT.
The used instrumentation is well described and their advantages but also their limitations are pointed out and their potential as a complementary tool to high-cost network stations is rightfully justified. The approach and method using the sky temperatures to estimate cloud coverage are well described.
The structure of the manuscript makes sense overall, although some parts could potentially be shortened or left out, e.g. the parts related to trace gases since these in fact have not been measured at all.
My main concerns are the following:
- The title mentions "TROPOMI Atmospheric Products" but the content does not provide retrievals of Aerosol, Ozone or NO2, but only for cloud fraction. Hence I would suggest to reflect that also in the title and replace "Atmospheric Products" with "Cloud Fraction".
- The title claims a "Ground-Based Validation". In my opinion the term "Validation" is a bit too strong here. For a proper validation I am missing a larger data sample, with data covering more seasons and more atmospheric conditions. My suggestion would be to rename "Ground-Based Validation of..." to "Ground-Based Comparisons to...".
- The claim of high correlation in cloud fraction is purely driven by two single measurement points (see Figure 2b). More measurements, particularly at higher DG15 MLX Cloud percentages are needed to uphold this claim. Although the presented data set provides a solid foundation, I believe that a larger measurement data set would significantly benefit this manuscript.I recommend publication of the manuscript after the following minor revisons:
Specific comments:Line 8: in order to increase the number of matched observations, the author could try to relax the quality filtering condition of QA >= 0.5 and see if this improves the data sample size, but the more appropriate way in my opinion would be to conduct more measurements until a statistically meaningful sample size covering the whole parameter space is achieved.
Line 114: Since clouds are well visible in RGB imagery, it would be interesting to see if the cloud fractions derived from the MLX IR temperatures could be backed up also with the TCS photometric RGB imagery? Maybe one or two RGB example plots could be added?
Line 162: The proper reference for the OCRA/ROCINN cloud retrieval algorithms used for the operational TROPOMI cloud product is Loyola et al., 2018: The operational cloud retrieval algorithms from TROPOMI on board Sentinel-5 Precursor, https://doi.org/10.5194/amt-11-409-2018.
Line 168: An up-to-date description of the QA value recommendations for each TROPOMI product can be found in the Product Readme Files (PRF) at https://sentiwiki.copernicus.eu/web/s5p-products. In the PRF-CL for the cloud product, section 3.1 summarizes the current recommendations. As mentioned earlier, the author could try to relax the filter criterium a bit in order to see if this increases the number of measurements for the comparisons.
Lines 199-200: It could be added and emphasized here that the Sentinel-5P OCRA/ROCINN cloud products are retrieved in the UV-VIS-NIR spectral region and that different sensitivity to different types of clouds when compared to the IR sky temperature measurements might be another source of systematic biases.
Lines 200-201: Same comment as for Line 168. The author could try if relaxing the QA threshold (or even using the initial 71 total matches) still leads to similar conclusions as with the quality-filtered N=27 data set.
Lines 209-210: This clustering is interesting. Since a cloud fraction retrieval on a continuous scale seems not achievable with the presented method, the author could try to evaluate the comparison using discrete bins like e.g. clear (<20%, which is often used as clear-sky threshold in trace gas retrievals), low (20%-50%), high (50%-75%) and covered (>75%) or other empirical thresholds.
Figure 1: In the title, as in the manuscript title, the term "Validation" could be replaced with the term "Comparison"
Figure 2: In the title, as in the manuscript title, the term "Validation" could be replaced with the term "Comparison"
Figure 2: In panel (b) the slope of the linear fit is largely driven by only two data points at high MLX cloud percentage values. As mentioned earlier, a larger data sample which also covers the whole parameter space would largely increase the robustness of the presented analysis.
Line 238: It is written that according to Fig. 2d, lower cloud top heights correlate with warmer sky temperatures. Maybe I am misinterpreting the plot, but don't low cloud heights suggest colder sky temperatures (< -10) in Fig. 2d?
Line 240: It is written that high altitude clouds show colder sky temperatures. Maybe I am misinterpretaing the plot, but isn't there a bi-modal distribution with high-altitude clouds both at colder temperatures (with lower cloud fractions) as well at warmer temperatures (with higher cloud fractions)?
Lines 243-244: It is written that Cloud and ozone products achieved the highest match rates, but Table 1 indicates a highest match rate for NO2 (82%). Does the author mean matched pairs instead of match rate?
Sections 3.5 and 3.6: Although it is well explained why the current instrumentation is not suited to retrieve Aerosol or trace gas products, maybe some of the available instrumentation could be used to back up the cloud fraction observations, e.g. the RGB measurements for a visual inspection of the scenes?
Section 4.1: The limiting factors of the comparison exercise are well explained here. As mentioned before, I believe that a larger data set covering more atmospheric conditions would be needed to strenghten the strong correlation claim.
Section 4.3: The "citizen science" approach to support existing "high-cost" networks is very interesting and in my opinion worthwhile to be pursued further.
Section 5: The planned future work listed in this section sounds promising and should be pursued further. In my opinion, the most important steps are the increase of the sample size by additional observations at more atmospheric conditions as well as to address the clustering limitation.
Technical corrections:Figure 1: The panel indicators (a), (b), (c), (d) written in the figure caption are not seen in the plot titles.
Figure 2: The top right legend in panel (a) for the blue squares seems to cover two data points, therefore it would be better to move the legend to the left side of the plot so that the data points can be seen.
Line 243: table -> Table
Citation: https://doi.org/10.5194/egusphere-2026-817-RC2 -
AC2: 'Reply on RC2', Wolfgang Schneider, 22 Mar 2026
Dear Reviewer #2,
Thank you for your careful and constructive review of our manuscript. We have addressed all comments in the attached Response Letter (PDF supplement). The revised manuscript has been submitted separately through the manuscript revision system.
The main scientific finding of this revision is that extending the observation period from 28 to 96 days reveals the S5P QA filter as the primary limiting factor for ground-satellite matchups in the Pre-Alpine region — not the ground measurement density. Both QA thresholds (≥0.5 and ≥0.35) are now reported and discussed honestly.
All 15 specific comments have been addressed. Please see the attached PDF for details.
Best regards, Wolfgang Schneider · DG2MCM
-
AC2: 'Reply on RC2', Wolfgang Schneider, 22 Mar 2026
-
RC3: 'Comment on egusphere-2026-817', Anonymous Referee #3, 24 Mar 2026
1. Summary of manuscript
The author presents a methodology for reliably estimating cloud cover from low-cost commercial-off-the-shelf (COTS) infrared sensor measurements and compare the COTS sensor-derived cloud cover with that from Sentinel-5P. These instruments can potentially be used by citizen scientists and similar organizations to provide spatially distributed ground validation of satellite cloud cover estimates to augment the limited spatial coverage of dedicated meteorological ground stations. A major concern with low-cost infrared sensors is that they are not calibrated to the same accuracy and precision as ground-station instruments. As such, cloud cover estimates calculated from absolute temperature measurements will be subjected to errors from the poor calibration of the COTS sensors. A second concern is that infrared measurements from COTS sensors will be influenced by sky cover, cloud altitude, and low-level temperature, but the COTS sensors won’t have additional instruments to separate these influences. To avoid these problems, the author instead provides a methodology in which cloud cover is estimated from the ratio between ambient and sky temperature measurements, adjusted for multi-day clear sky values observed by the same sensor. The comparison between co-located ground sensor and satellite cloud cover estimates reveals a large correlation of 88%; however, the author notes multiple complications with this result. Aside from cloud cover and top height, the author briefly discusses aerosol and trace gas metrics, but no conclusive results are presented.
2. Overall assessment
The ideas and method presented in this manuscript are worthy of consideration and represents a potential major opportunity to improve citizen science contributions to atmospheric science. Automated cloud measurements from a low-cost distributed network are a significant technical and logistical challenge beyond that of other measurement types such as precipitation and temperature. There are ongoing efforts to semi-automate cloud measurements from citizen scientists, such as collecting photographs of cloud/sky conditions and using neural networks to categorize them. Including an additional automated option to complement traditional naked eye reports and semi-automated observations would be a great advantage. The author’s investigation into using COTS infrared sensors to fill this purpose works well towards pursuing this goal. I also appreciate a dedicated space to discuss the results and their relevance to citizen science and other data sources in Section 4.
That said, the manuscript has a number of shortcomings that need to be addressed before it is ready for publication. There were multiple parts of the manuscript that confused me or that I found difficult to follow the line of reasoning. And so I recommend that major revisions are needed before the paper is ready to be published. I have listed my concerns below. I think that if these are addressed well, then this manuscript will potentially provide a valuable resource for citizen scientists and similar non-professional entities that wish to contribute automated cloud observations to complement traditional visual reports.
3. Major comments
3.1. The narrative supporting and justifying the main analysis method and results is underdeveloped, and I find it difficult to follow the author’s reasoning at times. I feel like there are logical steps that are missing or are implied rather than stated explicitly, and the text can be very obtuse and jargon-laden at times. From what I can tell, the logical chain or reasoning is as follows: 1) Satellite estimates of cloud cover need ground validation beyond what automated stations can provide; 2) citizen scientists can use COTS infrared sensors to supplement the automated stations; 3) COTS sensors cannot currently be calibrated rigorously like automated station instruments (e,g., too costly, time consuming, inconvenient, lack of proper calibration equipment); and thus 4) a cloud cover estimation method that minimizes the effects of poor calibration and does not require additional expensive sensors should be used for the COTS sensors. If this chain of reasoning is incorrect, then I may have misunderstood the purpose of the paper, and I’m concerned that other readers will as well. There are also portions of Section 4 that at first glance seem to contradict the reasoning presented in the rest of the paper (presuming that I correctly understand the rest of the paper). There seems to be some circular reasoning used as well, which I note in the minor comments, where it isn’t clear in the COTS sensors are being used to validate the satellite or vice versa. I suggest the author revise Sections 1, 4, and 5 to better clarify the motivation of the manuscript and the logical connection between the motivation and the analysis/results.
3.2. The author notes that the results presented in Fig. 2, and the large correlation value of 88% in Fig. 2b, should be treated with caution. I agree, and I find Fig 2b to be highly problematic. Without the raw numbers in front of me, I find it very difficult to understand how 27 measurements of continuous sky and ambient temperature values can collapse into just three discrete cloud cover estimated values. As far as I can see, this isn’t merely clustering behavior, which might be expected if there were three different prevailing sky conditions present during the study time (e.g., clear, cirrus-dominant, and low cloud-dominant). Instead, as far as I can see, there are just three and only three cloud cover values outputted by Eq. 1. There is nothing about Eq. 1 that suggests that it should output discrete values, unless either the inputs are discrete (which would not be compatible with Fig. 2c which shows continuous temperature measurements) or if there is some problem in the calculation of Eq. 1, e.g., unintended integer division instead of floating point. The author mentions that the COTS sensor has an uncertainty range of +-0.5°C which leads to a cloud fraction uncertainty of +-5%, but that does not explain the large discrete jumps in Fig. 2b. I would like the author to further explain how this odd result arises and why the results should be trusted despite it.
3.3. I would also like a more detailed discussion about Eq. 1 and the assumptions and limitations of the methodology. The author proposes an altered approach to previous efforts the author cites (and states this), and so I think a little more discussion of the reasoning behind the method is necessary. My biggest concern is that Tsky is influenced by both cloud cover and cloud altitude (and secondarily by cloud phase and thickness), but it isn’t obvious to me that Eq. 1 has a way to easily separate cloud cover and altitude. Multiple cloud layers with lower-level cover < 100% might also be a problem. It seems the author is aware of this and indirectly discusses it by mentioning the difference in observing cirrus clouds vs. low level clouds (i.e., we might expect an underestimate of cirrus cover), and by showing satellite-derived cloud altitude in Fig. 2d. Nevertheless, I think the author needs to explicitly and clearly discuss this issue in Section 2.1.2.
3.4. The author mentions multiple times having “professional metrology experience” which guides the testing methodology. I find the repeated statement of this to be unusual and distracting. I generally presume that someone publishing research in an accredited scientific journal would have some relevant knowledge and experience in the topic material! The analysis, results, and supporting discussion presented in a journal paper should stand on their own merits and not depend on the author’s credentials and experience. I can possibly understand if the author wants to make a point about the intersection of citizen science projects and professional research techniques/standards, which I think is the point of Section 4.3. Otherwise, the multiple mentions of the author’s expertise lend nothing to the strength of the material and should be reduced or removed.
3.5. The author includes a small amount of analysis and discussion of other measurements from the ground site, such as aerosols and trace gases, but the discussion is brief and includes either inconclusive or negative results. While I agree that it is important to publish negative results in addition to positive results (and the scientific community should do a better job of it), I’m not sure that the results presented for aerosols and trace gases helps the overall narrative. I think a better approach or the paper would to either 1) expand the aerosol and trace gas analyses to be comparable with the cloud cover analysis; or 2) reduce the discussion of the aerosol and trace gas analysis to a brief mention of concurrent/future work, and spend more of the paper discussing strengths and limitations of the COTS IR sensors. Either approach would be fine, as the author sees fit, but right now the analysis sits in an awkward in-between position.
4. Minor comments
L40. There seems to be some circular reasoning in how these COTS sensors are being used and tested here. The long-term plan, as I understand it, is to present a cloud cover estimate method for COTS sensors to allow the use of COTS sensors to validate satellite cloud cover estimates. Sentinel-5P is used in this study, but other satellites of interest are mentioned as well. Sentinel-5P cloud cover is validated against the COTS sensor estimates. But the methodology for testing the COTS sensor cloud estimates relies on Sentinel-5P cloud cover. Isn’t that a circular argument? Am I missing something? It seems to me that there should be an independent third source of measurement to check both the COTS and satellite estimates, such as the established networks discussed in Section 4.
L40. Also, if these COTS sensors are to be used for citizen science, then I would be very interested to see how they compare/contrast with traditional visual reports. Are there any plans for that sort of analysis?
L83. Tilting the sensor should also add or amplify 3D cloud cover estimate errors, correct? That is, cloud cover tends to be overestimated when looking close to the horizon compared to looking vertically, especially for cloud types with significant vertical growth. Is the accounted for in the analysis? I presume the issue is somewhat offset by the reduced IR influence of clouds at low elevation angles because of increased path length through the lower troposphere (as noted in the text), but is there still a notable effect?
L104. Does ambient humidity variability have much of an effect on uncertainty? Would it have much of an effect if this experiment is repeated in other locations with arid or humid climatologies?
L110. Perhaps a simple Tsky threshold could be used to determine situations when cirrus cloud cover is likely, and thus a correction factor to Eq. 1 can be applied?
L146. This statement, along with comments in Section 4, seem to contradict the motivation of this experiment. Why put the time and effort to develop a methodology that accounts for a lack of calibration for the COTS sensors when there are future plans to develop calibration devices? Is there expected to be a large time gap in the deployment of the calibration devices, and thus an interim cloud cover estimation technique will be limited? Will the future calibration devices be limited by expense and/or availability, and thus an alternative method is needed to cover uncalibrated sensors? I’d like a little more explanation to tie this statement with the rest of the paper.
L148. The results of the COTS sensor verification with other instruments and the weather station seem important enough to discuss further. If there isn’t a relevant citation, ti might be good material to include in an appendix or supplemental material.
Figure 1. The panels need a/b/c/d labels. Also, the ordering of the panels doesn’t seem to match the a/b/c/d ordering in the figure caption.
Figure 1, lower right panel: I’m not sure how to reconcile this panel with the statement in the text of all co-located data points being within 3.5 km of the ground site. The figure shows many data points much further than 3.5 km.
L241. Presumably there would be situations outside of this specific study where the assumption of cloud top height (which the satellite is sensitive to) matching cloud top base (which the ground sensors are sensitive to) fails (e.g. cumuliform clouds, deep nimbostratus, multiple cloud layers). Does the methodology account for these situations?
L298. Were cumulus clouds common during the experiment?
L308. Again I am confused about the reasoning for the experiment, and it seems at face value that this is a circular argument. Are the COTS sensors being used to validate Sentinel-5P, or is Sentinel-5P being used to test the sensors? If the COTS sensors are being assumed as ground truth, then Fig. 2b is highly problematic.
L317. I think this might be part of what kept confusing me when reading this paper – despite the section heading, there is no comparison of the COTS sensors with established networks in this paper – only with Sentinel-5P. It might be better to reword this section to describe the advantages and disadvantages of the COTS sensors vs. established networks, instead of alluding to a comparison which is not presented – and a comparison which I think would significantly strengthen this line of research.
L330. This isn’t a critique, but this is what I find very exciting about the material presented in the paper. I am very interested in exploring the use of COTS sensors by citizen scientists to aid traditional observation techniques. If possible, I would be interested to learn more about the practical ways in which this work can be applied to citizen science, and the link between citizen and professional science.
L334. It would be helpful to clearly define and describe the distinction between “calibration-informed methodology” and “formally calibrated measurements”.
L338. This paragraph feels very strange in how it is written. If detailed calibration issues that are understood via having a “professional metrology background” are important for the analysis, then they need to be described, and it should be explained why the analysis method addressed them. It would be good material for an appendix or supplemental material if placing them in the main text would disrupt the narrative.
L382. Once again, what is validating what? How are the COTS sensors being used to identify cloud-contaminated scenes for satellites when the only check on the COTS sensors is a satellite?
L404. I don’t see how this result follows from Fig. 2. At most, I see it distinguishing between “overcast” when the x-axis = 50% or 75%, and “not overcast” when the x-axis = 25%. If the COTS sensors and Eq. 1 are going to be used to detect cloud contaminated scenes for the satellite, then the result in Fig. 2b is insufficient.
Citation: https://doi.org/10.5194/egusphere-2026-817-RC3 -
AC3: 'Reply on RC3', Wolfgang Schneider, 28 Mar 2026
We thank Reviewer #3 for the thorough and constructive review. All major and minor comments have been addressed in the revised manuscript. A detailed point-by-point response is provided in the supplement (Reply_RC3_egusphere-2026-817.pdf), including Before/After text panels for all significant changes and the revised Figure 1.
The most important outcome of this revision is the honest and complete explanation of the firmware-level integer quantisation in the RAK4631 microcontroller (Comment 3.2), which clarifies the discrete cloud fraction values in Fig. 2b. The paper has been reframed as a proof-of-concept feasibility study demonstrating a 3-state sky condition classifier (Clear / Partly Cloudy / Overcast), with the 3-bin confusion matrix (Fig. RC2-3) as the primary performance metric. The circular reasoning has been resolved and the aerosol/trace gas sections have been appropriately reduced in scope.
The revised manuscript (main.pdf) has been submitted separately as "Revised Manuscript".
-
AC3: 'Reply on RC3', Wolfgang Schneider, 28 Mar 2026
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 199 | 79 | 30 | 308 | 14 | 11 |
- HTML: 199
- PDF: 79
- XML: 30
- Total: 308
- BibTeX: 14
- EndNote: 11
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
General Comments
This paper explores a custom approach to validating satellite data products (focusing on cloud fraction) as a proof-of-concept for what might be achieved using relatively low-cost equipment.
The approach is well-considered, and I appreciate the detailed consideration of methodology and limitations. As noted in the discussion, my primary concern is that there is too little quantitative information to draw meaningful conclusions, which results from both the relatively short sampling period and the limited precision of the derived cloud fraction from the ground-based sensor being tested. Therefore, my main recommendation is that the paper be revised and resubmitted after a longer data collection period, during which a broader range of meteorological conditions can be sampled, hopefully resulting in a greater dynamic range on the results to enable more robust conclusions.
Cloud fraction conclusions are stated a bit too strongly in some places given the relatively short study period and the limited precision of the cloud fraction calculated from MLX90614. In particular, the strong correlation reported is driven entirely by two points (Figure 2b); without these, correlation would be 0. Though this is noted in the discussion (lines 288-292), the message could be clearer throughout, especially in the abstract.
Given those limitations, I would also suggest a binning approach for the comparisons of Figure 2, i.e., divide TROPOMI cloud fractions into 3 bins centered on the MLX90614 cloud fraction clusters (< 38%, 38-63%, > 63%) and produce a “confusion matrix” plot showing when measurements fell into the same broad bins vs. when TROPOMI and MLX90614 disagreed. This could give a better sense of the qualitative capabilities of the comparison.
I broadly agree with the limitations and potential avenues for future work presented in the paper, and encourage to author to pursue these, as I think many will be fruitful in enhancing the value of the analysis presented here.
Specific Comments
Title: Suggest specifying “cloud fraction product” rather than “atmospheric products” in the title.
Line 37: Suggest also adding reference to the Pandonia Global Network (https://www.pandonia-global-network.org/home/) for trace gases; there are several examples of TROPOMI validation with the network on the publications page (https://www.pandonia-global-network.org/home/documents/publications/).
Lines 40-43: the cited references (Schneider et al., 2019; Lewis et al., 2016) refer to low-cost air quality sensors for in-situ measurement, which is a very different problem from low-cost remote sensing. More directly comparable prior work might include the GLOBE network, a worldwide citizen science effort to validate remote sensing (see, for example, https://doi.org/10.1175/BAMS-D-19-0295.1), or the use of hand-held sun photometers in the Maritime Aerosol Network (https://doi.org/10.1029/2008JD011257). The Müller et al. (2020) paper seems to refer to a PTR-TOF-MS instrument, which is not a low-cost method (though I am not familiar with the whole contents of that paper).
Lines 49-55: Suggest moving these details on instrumentation into section 2.
Section 2.1.2: It might be useful to note the (approximate) cost of the instrumentation, considering the focus of this study on low-cost technologies. Though costs are noted in Line 323, I believe it is more logical to list these costs as the instrumentation is being introduced here.
Line 188: Noting 348 overpasses is potentially misleading; Figure 1 seems to indicate 21 overpasses. Later, it is noted that there were 276 satellite-ground observations pairs after temporal matching, representing a 79% match rate, which is consistent with 348 satellite-ground observations pairs before temporal matching. This should be revised to distinguish between satellite overpasses (nearby overflights of the spacecraft) and paired observations across different products.
Lines 211-213: I think this can be emphasized more, i.e., the low precision of the measurements practically allowed only a few values of cloud fraction to be output from the instrumentation.
Line 221: There do not seem to be examples of cloud fraction <10% in the results.
Section 3.3: Suggest moving this (and Table 1) earlier to Section 3.1, when the discussion of data matching takes place.
Section 3.6: This discussion can be moved earlier, to motivate the discussion in other sections; it was unclear to me why these trace gas products were being mentioned until I read to this part of the manuscript.
Lines 344-348: These are important considerations for future work; filling gaps in global ground-based validation networks will require techniques that are robust against a lack of formal laboratory calibration capabilities in many regions. I suggest you carefully consider and emphasize these constraints throughout. For example, you recommend using inter-sensor differences to constrain measurement uncertainties (Section 3.4), which is an attractive approach when the sensors themselves are low-cost. This idea can be expanded on further.
Lines 352-354: Also the Sentinel-4 mission, recently launched.