Ground-Based Validation of Sentinel-5P TROPOMI Atmospheric Products using Calibration-Informed Low-Cost Multi-Spectral Sensors
Abstract. Ground-based validation of satellite atmospheric products is essential for ensuring data quality and algorithm performance. We present a validation approach for Sentinel-5P TROPOspheric Monitoring Instrument (TROPOMI) cloud fraction products using a multi-spectral ground station (DG2MCM-15) located in Kempten, Bavaria, Germany. The ground observatory combines professional metrological experience from ISO/IEC 17025 accredited laboratory environments with low-cost commercial sensors, creating a citizen science validation capability.
Our validation dataset comprises 276 temporally matched observations between Sentinel-5P overpasses and ground measurements over a four-week period (January 11 – February 8, 2026). Ground-based cloud detection using an MLX90614 infrared pyrometer achieves strong agreement with Sentinel-5P cloud fraction retrievals (Pearson R = 0.879, N = 27 after quality filtering). The root mean square error of 29.1 % cloud fraction reflects a systematic positive bias from spatial scale mismatch between the ground sensor field of view and satellite pixel dimensions. The method reliably distinguishes between clear, partially cloudy, and overcast conditions, though the derived cloud fraction values exhibit clustering due to the temperature-ratio approach used. Exploratory comparison with TROPOMI aerosol index products yielded negligible correlation due to the absence of UV spectral coverage in the ground sensor, identifying a clear instrumentation requirement for future aerosol validation work.
Temporal matching between satellite overpasses and ground observations achieved a mean time difference of 2.7 minutes, with 95 % of matches within 8 minutes of satellite observation time. Spatial co-location analysis confirms all validation points fall within the nominal TROPOMI pixel footprint (3.5 km × 5.5 km at nadir), though the spatial scale mismatch between the ground sensor field of view and satellite pixel dimensions remains the primary source of validation uncertainty.
Our results demonstrate that low-cost infrared sensors, when operated with calibration-informed measurement protocols, can provide scientifically useful satellite cloud product screening data, reliably distinguishing between clear, partially cloudy, and overcast conditions. The quasi-discrete nature of the derived cloud fraction highlights the need for improved cloud detection algorithms in future work. This approach offers a scalable pathway for expanding ground-based validation networks in regions lacking dedicated atmospheric monitoring infrastructure.
General Comments
This paper explores a custom approach to validating satellite data products (focusing on cloud fraction) as a proof-of-concept for what might be achieved using relatively low-cost equipment.
The approach is well-considered, and I appreciate the detailed consideration of methodology and limitations. As noted in the discussion, my primary concern is that there is too little quantitative information to draw meaningful conclusions, which results from both the relatively short sampling period and the limited precision of the derived cloud fraction from the ground-based sensor being tested. Therefore, my main recommendation is that the paper be revised and resubmitted after a longer data collection period, during which a broader range of meteorological conditions can be sampled, hopefully resulting in a greater dynamic range on the results to enable more robust conclusions.
Cloud fraction conclusions are stated a bit too strongly in some places given the relatively short study period and the limited precision of the cloud fraction calculated from MLX90614. In particular, the strong correlation reported is driven entirely by two points (Figure 2b); without these, correlation would be 0. Though this is noted in the discussion (lines 288-292), the message could be clearer throughout, especially in the abstract.
Given those limitations, I would also suggest a binning approach for the comparisons of Figure 2, i.e., divide TROPOMI cloud fractions into 3 bins centered on the MLX90614 cloud fraction clusters (< 38%, 38-63%, > 63%) and produce a “confusion matrix” plot showing when measurements fell into the same broad bins vs. when TROPOMI and MLX90614 disagreed. This could give a better sense of the qualitative capabilities of the comparison.
I broadly agree with the limitations and potential avenues for future work presented in the paper, and encourage to author to pursue these, as I think many will be fruitful in enhancing the value of the analysis presented here.
Specific Comments
Title: Suggest specifying “cloud fraction product” rather than “atmospheric products” in the title.
Line 37: Suggest also adding reference to the Pandonia Global Network (https://www.pandonia-global-network.org/home/) for trace gases; there are several examples of TROPOMI validation with the network on the publications page (https://www.pandonia-global-network.org/home/documents/publications/).
Lines 40-43: the cited references (Schneider et al., 2019; Lewis et al., 2016) refer to low-cost air quality sensors for in-situ measurement, which is a very different problem from low-cost remote sensing. More directly comparable prior work might include the GLOBE network, a worldwide citizen science effort to validate remote sensing (see, for example, https://doi.org/10.1175/BAMS-D-19-0295.1), or the use of hand-held sun photometers in the Maritime Aerosol Network (https://doi.org/10.1029/2008JD011257). The Müller et al. (2020) paper seems to refer to a PTR-TOF-MS instrument, which is not a low-cost method (though I am not familiar with the whole contents of that paper).
Lines 49-55: Suggest moving these details on instrumentation into section 2.
Section 2.1.2: It might be useful to note the (approximate) cost of the instrumentation, considering the focus of this study on low-cost technologies. Though costs are noted in Line 323, I believe it is more logical to list these costs as the instrumentation is being introduced here.
Line 188: Noting 348 overpasses is potentially misleading; Figure 1 seems to indicate 21 overpasses. Later, it is noted that there were 276 satellite-ground observations pairs after temporal matching, representing a 79% match rate, which is consistent with 348 satellite-ground observations pairs before temporal matching. This should be revised to distinguish between satellite overpasses (nearby overflights of the spacecraft) and paired observations across different products.
Lines 211-213: I think this can be emphasized more, i.e., the low precision of the measurements practically allowed only a few values of cloud fraction to be output from the instrumentation.
Line 221: There do not seem to be examples of cloud fraction <10% in the results.
Section 3.3: Suggest moving this (and Table 1) earlier to Section 3.1, when the discussion of data matching takes place.
Section 3.6: This discussion can be moved earlier, to motivate the discussion in other sections; it was unclear to me why these trace gas products were being mentioned until I read to this part of the manuscript.
Lines 344-348: These are important considerations for future work; filling gaps in global ground-based validation networks will require techniques that are robust against a lack of formal laboratory calibration capabilities in many regions. I suggest you carefully consider and emphasize these constraints throughout. For example, you recommend using inter-sensor differences to constrain measurement uncertainties (Section 3.4), which is an attractive approach when the sensors themselves are low-cost. This idea can be expanded on further.
Lines 352-354: Also the Sentinel-4 mission, recently launched.