the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Enhancing nighttime cloud optical and microphysical properties retrieval using combined imager and sounder from geostationary satellite
Abstract. Accurate retrieval of cloud optical and microphysical properties (COMP) at night is important for monitoring changes in weather and climate systems. The nighttime cloud optical and microphysical properties (NCOMP) retrieval is enhanced by integrating data from hyperspectral infrared sounder and high-resolution imager on the same geostationary platform with a machine learning framework. Using geostationary satellite imager broadband thermal infrared (TIR) channels along with dozens of optimally selected hyperspectral IR (HIR) channels, we demonstrate substantial improvements over traditional TIR-channel-based methods. The HIR channels enhance sensitivity to cloud effective radius (CER) and optical thickness (COT), particularly for optically thin clouds, reducing retrieval errors to 9.73 μm and 6.09, respectively, with an approximate 10 % accuracy improvement. The ML-based model preserves strong day-night continuity in COMP retrievals and assures the diurnal information for clouds, although challenges remain for thick clouds. This work highlights the importance of GEO-satellite-based HIR sounders, which provide critical spectral information that complements imager data for cloud optical and microphysical property retrievals. Middle-wave IR (MWIR) channels significantly improve COT retrieval. The proposed fusion approach offers a flexible retrieval framework applicable to future geostationary satellite systems for enhancing the cloud property retrievals containing diurnal information.
- Preprint
(20226 KB) - Metadata XML
-
Supplement
(3405 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
CC1: 'Comment on egusphere-2025-2928', Mengchu Tao, 06 Aug 2025
-
CC2: 'Reply on CC1', Xinran Xia, 07 Aug 2025
Thank you for your insightful question regarding the spatial and temporal matching between GIIRS and AGRI data. Given the differences in their resolutions (GIIRS China region: 12 km/2 h; AGRI fulldisk: 4 km/15 min), we adopted the following approach:
For each GIIRS pixel, we identified the nearest AGRI pixels—typically a 3×3 block (9 pixels) centered on the closest GIIRS footprint. To ensure temporal consistency, we enforced a maximum time difference of 15 minutes between matched GIIRS and AGRI observations, guaranteeing that only contemporaneous measurements were paired.
Notably, since the geolocations of GIIRS long-wave infrared (LWIR) and mid-wave infrared (MWIR) channels exhibit slight differences in their pixel center coordinates, we performed separate matching procedures: (1) GIIRS LWIR with AGRI, and (2) GIIRS MWIR with AGRI. This dual-matching approach accounts for the instrument's spectral band-dependent geolocation characteristics, ensuring higher spatial alignment accuracy for both infrared regimes.
Citation: https://doi.org/10.5194/egusphere-2025-2928-CC2
-
CC2: 'Reply on CC1', Xinran Xia, 07 Aug 2025
-
RC1: 'Comment on egusphere-2025-2928', Anonymous Referee #1, 30 Aug 2025
General comments
The authors present a machine learning framework to derive cloud optical and microphysical properties (COMP) from infrared brightness temperatures measured by a geostationary satellite. In their approach not only measurements from an imager but also from a sounder are used. The method allows retrieving COMP during nighttime, which is very useful since traditional methods are only available during daytime as they are based on shortwave radiance measurements. The paper indicates that IR-based COMP retrievals are feasible and can provide consistent results during day and night, albeit with large RMSE in both cloud optical thickness (COT) and cloud effective radius (CER). However, the added value of sounder data, which is the subject of this study, is not convincingly demonstrated.
The validation results indicate that COT from the IR-only model (Fig. 5) looks better than from the combined IR+LWIR+MWIR model (Fig. 6). Specifically, the combined model is strongly skewed towards lower values and does not produce COT larger than 50, while the IR-only model COT is more balanced. By not predicting high COT the combined model reaches a slightly lower RMSE but this appears to be rather an artifact than an achievement. Therefore, the only advantage of the combined model, the claimed marginal improvement in COT (CER is not improved), is actually not an improvement. Perhaps this is not surprising, since the spectral regions of the HIR sounder channels are basically covered by the imager channels, which furthermore have a much higher spatial resolution.
To make the manuscript suitable for publication, the claims of added value of the sounder for COMP retrievals would have to be removed or firmly demonstrated, while the specific comments below would also have to be considered.
Specific comments
P3, L73-79: MODIS and SEVIRI are not ‘satellite data application centers’. Also, Thies et al. (2008) is not a representative reference for SEVIRI.
P4, L116-117: With respect to which reference have these correlations between determined?
P5, L123: The reference Charles et al. (2024) is not included in the reference list. Furthermore, I cannot find a paper by Charles et al. Could it be that this is actually White et al. (2025, https://doi.org/10.1029/2024JD042829)?
P6, L155-159: It could be noted that Europe/Eumetsat launched a GEO sounder (IRS on MTG-S) in July 2025. In addition, I believe that monitoring temperature and humidity profiles rather than cloud/wind fields is the primary application of these instruments. Finally, the Lindsey reference could already be added in line 156 to reflect the planned US GEO IR sounder.
P6, L160-163: Here two research questions are formulated. However, the first one ‘What is the advantage of a GEO HIR sounder over a GEO IR imager for NCOMP retrieval?’ is not addressed. Only the added value of a HIR sounder (question 2) is studied. There are no experiments comparing sounder-based and imager-based retrievals.
P7, L194-199: The relocation of the satellite did presumably not take place on one day. It appears sub-optimal to use measurements from a month (March 2024) during which the satellite was drifting. In addition, more details on which data (days, time slots, ..) was used for training and validation are welcome? Was there also a test dataset? And were these sets independent?
P7, Section 2.2: The procedure for collocation of GIIRS and AGRI data should be better explained (how is resampling done, what is ‘top to bottom’, ..?).
P9, L256: What does ‘are softmax to CLP’ mean?
P10, L298-299: Please check RTTOV credentials. I believe it is developed by the NWP SAF.
P11, L339-341: This looks like a duplication of earlier information in lines 333-336, while the sentence in between is about a different topic.
P12, L349-350: Should this be COT (rather than COMP) sensitivity? A sensitivity of 0.2 is nowhere reached for CER.
P12, L356-359: How does sensitivity to water vapour and temperature provide a theoretical basis for COMP retrieval?
P13, L384-389: Isn’t this a counterintuitive result? Adding the LWIR channels, which have relatively high sensitivity to COT, increases the retrieval error. In contrast, the MWIR channels, with much lower sensitivity to COT, reduce the retrieval error. How can this be explained?
P13, L403-405: What does it mean that COT shows better agreement than CER? How can the errors be compared?
P14, L410-412: An improvement in liquid cloud detection is mentioned with specific numbers, while the deterioration in ice cloud detection is disguised by stating that ‘it remains > 94%’. That is not balanced reporting.
P15, L442: The term droplets suggests that only liquid clouds were analysed here. However, the droplets appear to be much larger than what is commonly retrieved (CER<40 micron). That raises the question whether ice particles are also included. And – if not – how are the results for ice clouds?
P15, L468: Are all these clouds cirrus? COT appears to reach values of 50 and higher, which is not compatible with cirrus. Also on later occasions, it looks like cirrus has been used as a synonym for ice clouds.
P15, L469-470: Compared to which other models/retrievals is the performance ‘exceptional’?
P18, L547: The number 9.73 seems to refer to the CER RMSE in Figure 6. However, the reference RMSE in Figure 5 was 9.72. How is this a reduction?
Table 2: These numbers differ from those in Figures 5 and 6. Is that because they are based on a different dataset (validation versus test?) or because the figures are for a restricted range of COT and CER? If the latter restriction is thought to be more relevant, why was the model choice not based on that restricted dataset?
Figure 3: Is the RTTOV simulation, which is presumably based on many uncertain inputs, a proper reference? IASI seems better suited to serve as a reference, and biases with respect to IASI actually appear to be rather small.
Figure 6: The LWIR+MWIR model clearly has a problem with predicting high COT (panel d) and the histogram in panel e also deviates from AGRI-L2. Both these aspects are clearly better for the IR-only model in Figure 5. The RMSE of the combined model appears to be artificially lowered as a result of the LWIR+MWIR model not predicting high values (which blow up the RMSE). Hence, IR-only actually appears to be the better model for COT. This undermines the main conclusion of the paper.
Figure 9: Usually IR channels are inverted to give a more natural appearance of clouds in brighter colours.
Figure 9: How is COMP retrieved for mixed-phase clouds? From the images it looks like they are treated as liquid water clouds. Also, there are regions (e.g., western side of the area) where COT>0 and CER>0 but neither LWP nor IWP have a value. How can that be explained?
Technical comments
P2, L41: What is ML?
P2, L44: Explain GEO, also in main text at first occurrence.
P2, L46: Middle-wave -> Mid-wave
P3, L70: Introduce VIS/NIR abbreviations
P3, L80-82: Write full name followed by acronym/abbreviation in brackets.
P4, L97: Insert ‘are’ after that; no capital in Daytime; makes -> making
P4, L103: intelligent -> intelligence
P4, L112-117: Split up in three sentences.
P5, L130: offers -> offer
P8, L232: Start new sentence at ‘The detailed ..’.
P11, L334 and L336: Are the minus signs typos? The figures present sensitivities in absolute sense.
P13, L379-381: Check sentence.
P17, L510: Figure 10 should be Figure 11.
P18, L566-567: (e.g., MTG and successors of GOES-R).
P18, L570: Capitalize GFS.
P20, L605: Andi, W. should be Walther, A.
Citation: https://doi.org/10.5194/egusphere-2025-2928-RC1 - RC2: 'Comment on egusphere-2025-2928', Anonymous Referee #2, 23 Sep 2025
-
RC3: 'Comment on egusphere-2025-2928', Anonymous Referee #3, 05 Oct 2025
This manuscript details experiments where a convolutional neural network was trained using infrared channels to match a daytime cloud optical properties product. The primary contribution of this work aims to be the demonstration of including a selection of hyperspectral IR channels into the estimation of daytime-like cloud properties at night. The overall objective is sound and appropriate for this journal, but I have several major concerns about this manuscript listed below.
General Comments
1.) Improvements in model performance appear to be very small relative to the IR-only baseline. In Table 2, models that have lower MAE/RMSE in COT also have higher MAE/RMSE in CER when compared to the IR-only baseline. This raises the question of whether these models appear to be an improvement simply because they are neglecting CER in favor of COD. None of the models shown simultaneously improve CER and COT. The observed performance differences are small and may fall within the expected run-to-run variability due to stochasticity in training. Because of this, multiple training runs should be reported for each category in Table 2.
2.) Several models that include GIIRS channels in addition to the native imager channels perform significantly worse. At the worst, I would expect the additional information to have no effect on the model at all. This is mentioned occasionally in the text, but why this happens is not discussed.
3.) There is not enough detail on the separation of the training, validation and testing sets to ensure independence. Please include exact dates and locations used so readers can be certain there is not overlap between these datasets.
4.) When the authors refer to DCOMP or NCOMP algorithms and implementations it is often vague exactly which product, implementation or method is being referred to. Please be specific about which methods are being referred to, add references to supporting material about how they operate, and describe them briefly in the text.
5.) Results are occasionally not reported in a fair way. There are several cases (most apparent in the discussion of Table 2) where including GIIRS channels worsens model performance, but little explanation is offered. Spatial artifacts in the predictions as in Figure 10j,k,l are not discussed.
6.) Cloud phase is estimated by this model alongside CER and COT, but including cloud phase in this model is not properly motivated. The specific algorithm that produces the cloud phase, CER, and COT is not sufficiently discussed. It is not clear why CLP is being evaluated, and inclusion of GIIRS channels seems to worsen predictions of CLP without this being discussed in the text.
7.) These models are trained to match an operational AGRI product and contrasted against in infrared-only neural network baseline. However, when improvements model COT and CER are characterized in the conclusions and abstract, they lack this very important bit of context. The specific contribution of this work is an improved statistical matching of the daytime product through using GIIRS observations relative to only using IR observations from the imager. In order to claim that these COT and CER estimates more closely match reality, an evaluation on a more reliable independent source is needed. The comparison to ERA5 comes close to this, but is not sufficiently motivated in the text. I am unfortunately not convinced that ERA5 is a reliable point of comparison regarding the diurnal variation of cloud optical properties.
Specific Comments:
Abstract Line 40-41: What is this reducing error with respect to? Is this compared against the daytime operational AGRI L2? Since this model is trained to match the daytime operational product, it is ambiguous whether this is referring to the AGRI L2 product or an independent validation source.
Line 78: Salomonson et al. 2002, Thies et al. 2008: These do not seem to be appropriate references for the sensors themselves, or the COMP products mentioned here.
Line 84: Andi et al., 2013 should be Walther et al. 2013.
Line 85: Which COMP algorithm, specifically? Is this referring to the Nakajima-King type retrieval algorithm, or a specific product mentioned in the previous paragraph?
Line 95: Which DCOMP algorithm, specifically?
Line 123: Charles et al. 2024 might be referring to White et al. 2024 and is not in the reference section. You may wish also wish to cite the peer-reviewed version (White et al. 2025) instead:
White, C. H., Noh, Y.-J., Haynes, J. M., & Ebert-Uphoff, I. (2025). Emulating daytime ABI cloud optical properties at night with machine learning. Journal of Geophysical Research: Atmospheres, 130, e2024JD042829. https://doi.org/10.1029/2024JD042829Line 123: Given that White et al. 2025 is extremely similar to this work in terms of objective, methodology and analyses performed, a discussion of the similarities and differences is merited. The main differentiating factor is the inclusion of hyperspectral IR information and could be emphasized here.
Line 126: Which NCOMP product, specifically?
Line 195-200. Is it five days from each month or five days total? Please be more specific about which days, times of day, and locations were used in the training and validation datasets. Ideally there should be enough information so that one can recreate a reasonably similar dataset in order to reproduce your results.
Line 214: It’s not clear to me why there is a citation here to Walther et al. 2013. If it motivates the choices of thresholds here, please state that specifically.
Line 215-216: Based on the details given, it is not correct to state “transforming the model input data into a standard normal distribution with mean 0 and standard deviation 1.” Z-score normalization alone will not turn the data into a normal distribution. It may have zero mean and unit variance, but that does not make it normally distributed.
Line 219: How are the training, validation, and testing sets separated from each other to ensure independence? It is only stated that they came from the same months of the year (March, June, July). Are they from different days, years, or locations? If samples from the training, validation, and testing sets came from similar days, then we cannot know whether the model is overfitting to the test set. Without first ensuring that these datasets are reasonably independent, one cannot trust the results of this analysis.
Line 230 – 257 & Figure 1: I am not convinced that the transformer block at the bottleneck offers any value to this model. The inclusion of this feature is justified in the text by “[...] which employs self-attention mechanisms to capture long-range dependencies and global contextual information, further enriching the feature representations.” This model is given patches with a height and width of 32. After the 3 max-pooling layers shown in the figure, this would be a Bx256x4x4 latent representation. The transformer block would not substantially increase the receptive field of this model compared to a standard ConvNext block with the default kernel size (7x7). A ConNext block in the bottleneck with a kernel size of 4x4 would have the same ability to capture long-range dependencies.
Line 247: What kind of up-sampling?
Line 271: “We choose a batch size of 64 to capture as many local COMP features as possible” It is not clear to me what is meant by this statement.
Line 276: Should this be “and both GIIRS LWIR and MWIR”?
Line 348-Line368: The sensitivity analysis showed that the spectral features associated with the different cloud properties here are fairly broad and similar across many channels. For example, the differences we see in simulated BT across different CER in Figure 2 are fairly consistent across many bands. This makes me wonder how much unique information is in any given channel even when looking across the selected subset of 189 channels. It seems to me that this selection methodology would choose a large group of channels that are highly correlated with one another and don’t add much unique information relevant for the ML task.
Section 3.1: I think this section would be much more beneficial if you were able to contrast the sensitivity in GIIRS to the sensitivity you already have in the much wider bands present on the imaging instrument. The potential benefit from including GIIRS must come from the additional spectral information not already present on the imager. The lack of finer-scale differences in the spectral features seen in Figure 2 leads me to think that these sensitivities may already be represented on the lower-spectral resolution imaging channels. You might consider adding another dimension to your selection methodology that gives preference to wavelengths not well represented on the imager. Currently, it is not clear to me that this selection methodology is adding any spectral information in LWIR that is not already present in the imager data.
Table 2: When differences in model performance are this similar, one needs to compare the average of multiple model initializations. Ideally at least 3-10 runs for each category. These differences, particularly for CER, are in the range of what one might expect from variance in the random seed used.
Table 2: Are these results from the training set or validation set? Please be specific about where the data to produce this table came from.
Table 2: All of the models shown in Table 2 that have lower COT MAE, all have higher CER MAE. It is a very reasonable conclusion that these models are estimating more accurate COT at the expense of a less accurate CER. There needs to be a more thorough accounting of these differences.
Line 391: Which days from June 2024? It is not clear from the text if this is cleanly separated from the training data, so readers cannot be confident that this is an independent evaluation.
Line 394: This manuscript would benefit from describing the specific AGRI L2 COMP product and the source of the target cloud phase information that the ML model is trained to match. Perhaps this is best to include earlier in the manuscript.
Line 425 – 428: I feel that the language needs to be more specific here. Roughly half of the models that include GIIRS channels underperform the IR-only baseline in Figure 7a-c. “ [...] with GIIRS channels providing slight improvement by approximately 0.5 μm, while COT retrieval demonstrates more significant enhancement (RMSE decreasing from 1.0-2.2 to 0.5-1.7).” This statement does not seem to be a fair interpretation of what we see in this figure. While there are models that outperform the IR-only baseline, it often a different model in each subplot that has the lowest RMSE. There is more discussion needed for these results including comments on why the addition of GIIRS channels frequently makes for a worse model compared to the IR-only baseline.
Figure 9: What are the box-shaped artifacts in the 9.k and 9.l?
Line 496-500: There are several features with COT<20 that appear to be smoothed over in the southwest.
Line 515-520: This is much too strong of a statement to make given the evidence provided. Since this model is trained on daytime data, the most pressing concern a reader will have is about generalization to nighttime scenes. Answering this question requires a more independent and thorough evaluation because it involves generalizing outside the source domain of these models. Aligned texture in a Level-2 imager CTH field between model retrievals of CER and COT is not enough evidence to make this claim.
Line 528: Why is this a characteristic patten?
Line 550: “providing spatially consistent retrievals compared to reference data” does not seem to be supported by the analysis in this paper. Predicted CER and COT appear significantly smoother than the reference L2 product and contain box-like artifacts in every image.
Technical corrections:
Line 219: Should this be “This processing yielded 2,568,230 samples for training and 492,153 (commas shifted one place) samples for testing and 220 validation?
Figure 1: The height and width dimensions are not correctly represented in this figure. After a 2x2 Max Pooling layer, the dimensions should be B x 64 x 16 x 16 based on the description.
Figure 9b-c: IR channels are typically displayed with the greyscale color map inverted from what is shown here.
Line 511: Should this list Figure 11a instead?
Citation: https://doi.org/10.5194/egusphere-2025-2928-RC3
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
1,146 | 48 | 20 | 1,214 | 18 | 27 | 18 |
- HTML: 1,146
- PDF: 48
- XML: 20
- Total: 1,214
- Supplement: 18
- BibTeX: 27
- EndNote: 18
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Thank you for your valuable work on Enhancing nighttime cloud optical and microphysical properties retrieval using combined imager and sounder from geostationary satellite.
I have a question regarding your methodology:
How do you achieve the spatial matching between the satellite sounder and imager data, especially considering differences in their spatial resolutions and observation geometries? Could you elaborate on the procedures or algorithms you use to ensure accurate co-location of measurements from these two instruments?
Thank you very much in advance for your clarification.