Above Cloud Aerosol Detection and Retrieval from Multi-Angular Polarimetric Satellite Measurements in a Neural Network Ensemble Approach

Yuan, Zihao; Fu, Guangliang; Lin, Hai Xiang; Erisman, Jan Willem; Hasekamp, Otto P.

doi:https://doi.org/10.5194/egusphere-2025-1488

Preprints

https://doi.org/10.5194/egusphere-2025-1488

Preprints

08 Apr 2025

| 08 Apr 2025

Above Cloud Aerosol Detection and Retrieval from Multi-Angular Polarimetric Satellite Measurements in a Neural Network Ensemble Approach

Zihao Yuan, Guangliang Fu, Hai Xiang Lin, Jan Willem Erisman, and Otto P. Hasekamp

Abstract. This paper describes an algorithm for above-cloud aerosol (ACA) retrievals from PARASOL (Polarisation and Anisotropy of Reflectances for Atmospheric Science coupled with Observations from a Lidar) Multi-Angle Polarimetric measurements. The algorithm, based on neural networks (NNs), has been trained on synthetic measurements and has been applied to the processing of one-year PARASOL data. The algorithm makes use of three subsequent NNs: 1) for the detection of liquid clouds, 2) for the retrieval of aerosol properties for ACA cases, and 3) an NN forward model to evaluate the goodness-of-fit of the retrieval. The NN's theoretical capability of retrieval is investigated by several synthetic data studies. It is shown that the NN is able to retrieve ACAOT (above cloud aerosol optical depth), AE (Angstrom exponent), and SSA (single scattering albedo) yielding an RMSE (root mean squared error) of ~0.1 on ACAOT, ~0.4 on AE and ~0.04 on SSA in synthetic experiments. Finally, comparison between the NN retrievals and adjacent PARASOL-RemoTAP clear sky retrieval in 2008 shows good agreement within the range that is expected from the synthetic study.

Received: 28 Mar 2025 – Discussion started: 08 Apr 2025

Competing interests: At least one of the (co-)authors is a member of the editorial board of Atmospheric Measurement Techniques.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 9215 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (9215 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

16 Oct 2025

Above Cloud Aerosol Detection and Retrieval from Multi-Angular Polarimetric Satellite Measurements in a Neural Network Ensemble Approach

Zihao Yuan, Guangliang Fu, Hai Xiang Lin, Jan Willem Erisman, and Otto P. Hasekamp

Atmos. Meas. Tech., 18, 5415–5434, https://doi.org/10.5194/amt-18-5415-2025,https://doi.org/10.5194/amt-18-5415-2025, 2025

Short summary

Zihao Yuan, Guangliang Fu, Hai Xiang Lin, Jan Willem Erisman, and Otto P. Hasekamp

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-1488', Anonymous Referee #1, 19 May 2025

The manuscript presents a neural network-based approach to above-liquid-cloud aerosol retrievals from the multi-angle polarimetric measurements by PARASOL. The method utilizes 3 separate NNs: one to determine if there is a liquid cloud layer, one to perform the ACA retrieval, and one for an approximate forward model for goodness-of-fit evaluation. Seasonal distributions of above-liquid-cloud aerosol optical thickness appear to generally agree with past studies. However, the manuscript lacks crucial details and tests that preclude its publication in the present state. Detailed comments are provided below.

0. A high-level comment: there are some acronyms that are defined in multiple places in the manuscript (for example, see comment #3), as well as places where the defined acronyms are not used, e.g., lines 14-15 spell out "above cloud aerosol" and "direct radiative effect" rather than using the ACA and DRE acronyms that were defined in the sentences before it. It would be good to define these acronyms only at their first usage and consistently use them throughout the manuscript.

1. Line 7: "ACAOT (above cloud aerosol optical depth)" should be changed for consistency, either to "ACAOT (above cloud aerosol optical thickness)" or "ACAOD (above cloud aerosol optical depth)".

2. Line 64-65: "The level 1 data are provide on a common sinusoidally grid of approximately with ground pixels of approximately 6 × 6 km2." - This sentence phrasing is confusing and I am not entirely sure of its intended meaning. Please rephrase it to more clearly convey the intended meaning.

3. Line 89: The AOT acronym was defined earlier in the text on line 45, it doesn't need to be re-defined here.

4. Section 3.2: While this section does a great job explaining how the training data were produced, it lacks sufficient detail for other aspects relevant to NN training:

a) Lines 154-155: Real measurements would have noises that vary across different measurements. Why is the noise assumed to be a constant relative noise of 0.02 for the intensity and a constant absolute noise of 0.012 for DoLP? These seem arbitrarily chosen, based on the provided information. This also suggests that the model will not be as accurate when noise levels deviate from these assumed values. Were other noise levels considered? See also comment #5d below.
b) Lines 162-163: How was the ensemble size for each model component determined?
c) Lines 159-161: Regarding the description of NN ensembles, the current phrasing suggests that the described approach ("the whole training set is equally and randomly divided into several parts") is the only way to perform this, but in reality there are many other methods to achieve NN ensembles that do not follow this procedure (see, e.g., Dietterich, 2000). Please rephrase this sentence so that it is clear that this is your elected methodology to achieve an ensemble of NNs, not that this is the only methodology to achieve it.
d) Lines 165-166: For the cloud mask and retrieval NN, how was the measurement noise model determined? These values seem arbitrarily chosen based on the provided information.
e) Line 169: How was the batch size of 12,000 selected for this problem, and were other batch sizes considered? This is an unusual value, as typical batch sizes are chosen as 2 to some power, for efficiency when using a GPU (see, e.g., Kandel & Castelli, 2020). This is also unusual given the magnitude, as batch sizes are often significantly smaller than this; existing literature suggests that small batch sizes perform better (e.g., Bengio, 2012; Masters & Luschi, 2018; Kandel & Castelli, 2020).
f) Line 170: "mean root square error (RMSE)" should be fixed to "root mean square error (RMSE)".
g) Lines 170-172: How were the model architectures determined? It is surprising that a given model has the same number of neurons in each layer, and that all 3 models have the same number of hidden layers.
h) What activation functions were used, and how were they determined?
i) What learning rates were used to train these models? How were those learning rates determined?
j) Line 119: How were the 8 million samples split into subsets for training, validation, and testing?
k) Line 144: How did you determine the number of leading PCs to use for each variable? How much explained variance do these PCs comprise? Did you first attempt this without using PCA and find poor results?

5. Section 4.1 is missing some key synthetic tests/results:

a) What is the confusion matrix for the NN liquid cloud mask model? This is a crucial table to provide for any classification NN to better understand the rate of true/false positives/negatives and thereby the reliability of the NN for this task.
b) What is the accuracy of the goodness-of-fit determination by the NN forward model? For a given state vector retrieved by the NN retrieval model, a forward model can be computed by either the NN forward model or the physics-based forward model. Doing both for all cases considered in this synthetic test will enable the determination of the reliability of the NN forward model for this task based on metrics like the correlation coefficient, coefficient of determination, RMSE, etc.
c) The correlations (I assume the R correlation coefficient? Please be explicit) reported here suggest that the NN model poorly captures AE behavior, especially under fine- or dust-dominated conditions. This is also the case for SSA under dust-dominated conditions. This seems to suggest that the retrieval method is only weakly sensitive to these parameters. How does this compare with physics-based retrievals of these parameters under these conditions? If it similarly struggles, it would be good to discuss this and clearly point out that this is not a limitation of the NN approach. However, if the physics-based retrievals do not have this issue, then it suggests that the presented NN-based approach is not optimal.
d) Related to comment #4a: How does the model perform when applied to synthetic data with a different noise level than that assumed in Sec. 3.2? It would be helpful to understand how this impacts the results, given that real measurements will not exactly follow the noise model assumed when generating the training data.

6. Section 4.2 / Figure 3:

a) For Fig 3's a-c panels, it looks as if only a single retrieval was performed at the 4 chosen optical depths. I assume this is really showing the mean RMSE over all 10,000 retrievals at a given optical depth, rather than the RMSE of a single retrieval as the labeling and caption currently indicates? For clarity it would be good to include error bars showing the standard error of the mean, which will better show whether the observed trends are statistically significant. Please also update the caption to clarify that these RMSE values are averaged over the 10,000 cases considered at each COT value.
b) Am I correctly inferring that the "number of remaining pixels" means the number of cases that were not screened out by the retrieved cloud fractions or goodness-of-fit metric? Assuming this is the case, rather than reporting the absolute number of pixels in Fig 3's d-f panels, it would be clearer to report the fraction of pixels where retrievals weren't screened out or, conversely, the fraction of pixels that were screened out.

7. Lines 219-221: I don't necessarily agree with this statement. Looking at SSA, the synthetic test found a correlation of ~0.76, with the correlation lowest for the dust-dominated cases at ~0.56. Meanwhile, in Fig 4, the SSA correlation is even lower at ~0.37. Fig 4 also shows a >27% increase in RMSD for SSA when compared to the RMSE of the synthetic tests. From this, I would conclude that the AE is more comparable for above-cloud and adjacent clear-sky retrievals than AOT or SSA.

8. Line 245: "The high AE and low SSA is an expected feature of the smoke in mid-Africa." Can you please provide a reference for this statement?

9. Lines 245-247: Such a large disagreement in AE suggests some fundamental or systematic difference between the method considered here vs. that considered by Waquet et al. (2013), and it would be good to understand the origin of this difference. Is there a reasonable explanation why the reported AEs are less than 1/2 that reported by Waquet et al. in these midlatitudes? Does Waquet et al. also only consider above-liquid-cloud AE? If not, given that above-ice-cloud AE is disregarded here, is that perhaps a common occurrence and the results of Waquet et al. are elevated due to that above-ice-cloud AE? Does the Waquet et al. approach overestimate ACAOT, or does the RemoTAP method considered here underestimate AE? Did Waquet et al. similarly consider the differences between above-cloud AE with adjacent clear-sky AE? Hopefully these questions can help to better elucidate the origin of this stark difference.

10. Lines 255-257: I don't agree that the synthetic experiments indicate the NNs have the ability to retrieve AE from fine- and dust-mode-dominated aerosol. Figure 3 shows that the performance is poor in these regimes: the NN underestimates the AE for fine-dominated cases, and it overestimates the AE for dust-dominated cases.

11. Lines 273-275: "... the NN-based surrogate forward model, just like the full-physical model, can provide goodness-of-fit mask to filter unphysical retrievals, which may due to imperfect cloud mask or some challenging aerosol/cloud/surface combination.":

a) I don't see where this is substantiated in the manuscript; no comparisons are performed between the NN's goodness-of-fit calculation and a physics-based model's corresponding goodness-of-fit metric. Please perform the test in comment #5b to substantiate this claim.
b) The last clause of this sentence is phrased awkwardly, and I cannot discern the intended meaning in the context of the rest of the sentence. Please rephrase this so that it clearly conveys the intended meaning.

Citation: https://doi.org/10.5194/egusphere-2025-1488-RC1
- AC1: 'Reply on RC1', Zihao Yuan, 25 Jul 2025
  
  We would like to thank the reviewer for his/her important comments, and we upload the response here in the supplement.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1488-AC1
RC2:
'Comment on egusphere-2025-1488', Anonymous Referee #2, 09 Jun 2025

see my pdf.

Citation: https://doi.org/10.5194/egusphere-2025-1488-RC2
- AC2: 'Reply on RC2', Zihao Yuan, 25 Jul 2025
  
  We would like to thank the reviewer for his/her important comments, and we upload the response here in the supplement.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1488-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-1488', Anonymous Referee #1, 19 May 2025

The manuscript presents a neural network-based approach to above-liquid-cloud aerosol retrievals from the multi-angle polarimetric measurements by PARASOL. The method utilizes 3 separate NNs: one to determine if there is a liquid cloud layer, one to perform the ACA retrieval, and one for an approximate forward model for goodness-of-fit evaluation. Seasonal distributions of above-liquid-cloud aerosol optical thickness appear to generally agree with past studies. However, the manuscript lacks crucial details and tests that preclude its publication in the present state. Detailed comments are provided below.

0. A high-level comment: there are some acronyms that are defined in multiple places in the manuscript (for example, see comment #3), as well as places where the defined acronyms are not used, e.g., lines 14-15 spell out "above cloud aerosol" and "direct radiative effect" rather than using the ACA and DRE acronyms that were defined in the sentences before it. It would be good to define these acronyms only at their first usage and consistently use them throughout the manuscript.

1. Line 7: "ACAOT (above cloud aerosol optical depth)" should be changed for consistency, either to "ACAOT (above cloud aerosol optical thickness)" or "ACAOD (above cloud aerosol optical depth)".

2. Line 64-65: "The level 1 data are provide on a common sinusoidally grid of approximately with ground pixels of approximately 6 × 6 km2." - This sentence phrasing is confusing and I am not entirely sure of its intended meaning. Please rephrase it to more clearly convey the intended meaning.

3. Line 89: The AOT acronym was defined earlier in the text on line 45, it doesn't need to be re-defined here.

4. Section 3.2: While this section does a great job explaining how the training data were produced, it lacks sufficient detail for other aspects relevant to NN training:

a) Lines 154-155: Real measurements would have noises that vary across different measurements. Why is the noise assumed to be a constant relative noise of 0.02 for the intensity and a constant absolute noise of 0.012 for DoLP? These seem arbitrarily chosen, based on the provided information. This also suggests that the model will not be as accurate when noise levels deviate from these assumed values. Were other noise levels considered? See also comment #5d below.
b) Lines 162-163: How was the ensemble size for each model component determined?
c) Lines 159-161: Regarding the description of NN ensembles, the current phrasing suggests that the described approach ("the whole training set is equally and randomly divided into several parts") is the only way to perform this, but in reality there are many other methods to achieve NN ensembles that do not follow this procedure (see, e.g., Dietterich, 2000). Please rephrase this sentence so that it is clear that this is your elected methodology to achieve an ensemble of NNs, not that this is the only methodology to achieve it.
d) Lines 165-166: For the cloud mask and retrieval NN, how was the measurement noise model determined? These values seem arbitrarily chosen based on the provided information.
e) Line 169: How was the batch size of 12,000 selected for this problem, and were other batch sizes considered? This is an unusual value, as typical batch sizes are chosen as 2 to some power, for efficiency when using a GPU (see, e.g., Kandel & Castelli, 2020). This is also unusual given the magnitude, as batch sizes are often significantly smaller than this; existing literature suggests that small batch sizes perform better (e.g., Bengio, 2012; Masters & Luschi, 2018; Kandel & Castelli, 2020).
f) Line 170: "mean root square error (RMSE)" should be fixed to "root mean square error (RMSE)".
g) Lines 170-172: How were the model architectures determined? It is surprising that a given model has the same number of neurons in each layer, and that all 3 models have the same number of hidden layers.
h) What activation functions were used, and how were they determined?
i) What learning rates were used to train these models? How were those learning rates determined?
j) Line 119: How were the 8 million samples split into subsets for training, validation, and testing?
k) Line 144: How did you determine the number of leading PCs to use for each variable? How much explained variance do these PCs comprise? Did you first attempt this without using PCA and find poor results?

5. Section 4.1 is missing some key synthetic tests/results:

a) What is the confusion matrix for the NN liquid cloud mask model? This is a crucial table to provide for any classification NN to better understand the rate of true/false positives/negatives and thereby the reliability of the NN for this task.
b) What is the accuracy of the goodness-of-fit determination by the NN forward model? For a given state vector retrieved by the NN retrieval model, a forward model can be computed by either the NN forward model or the physics-based forward model. Doing both for all cases considered in this synthetic test will enable the determination of the reliability of the NN forward model for this task based on metrics like the correlation coefficient, coefficient of determination, RMSE, etc.
c) The correlations (I assume the R correlation coefficient? Please be explicit) reported here suggest that the NN model poorly captures AE behavior, especially under fine- or dust-dominated conditions. This is also the case for SSA under dust-dominated conditions. This seems to suggest that the retrieval method is only weakly sensitive to these parameters. How does this compare with physics-based retrievals of these parameters under these conditions? If it similarly struggles, it would be good to discuss this and clearly point out that this is not a limitation of the NN approach. However, if the physics-based retrievals do not have this issue, then it suggests that the presented NN-based approach is not optimal.
d) Related to comment #4a: How does the model perform when applied to synthetic data with a different noise level than that assumed in Sec. 3.2? It would be helpful to understand how this impacts the results, given that real measurements will not exactly follow the noise model assumed when generating the training data.

6. Section 4.2 / Figure 3:

a) For Fig 3's a-c panels, it looks as if only a single retrieval was performed at the 4 chosen optical depths. I assume this is really showing the mean RMSE over all 10,000 retrievals at a given optical depth, rather than the RMSE of a single retrieval as the labeling and caption currently indicates? For clarity it would be good to include error bars showing the standard error of the mean, which will better show whether the observed trends are statistically significant. Please also update the caption to clarify that these RMSE values are averaged over the 10,000 cases considered at each COT value.
b) Am I correctly inferring that the "number of remaining pixels" means the number of cases that were not screened out by the retrieved cloud fractions or goodness-of-fit metric? Assuming this is the case, rather than reporting the absolute number of pixels in Fig 3's d-f panels, it would be clearer to report the fraction of pixels where retrievals weren't screened out or, conversely, the fraction of pixels that were screened out.

7. Lines 219-221: I don't necessarily agree with this statement. Looking at SSA, the synthetic test found a correlation of ~0.76, with the correlation lowest for the dust-dominated cases at ~0.56. Meanwhile, in Fig 4, the SSA correlation is even lower at ~0.37. Fig 4 also shows a >27% increase in RMSD for SSA when compared to the RMSE of the synthetic tests. From this, I would conclude that the AE is more comparable for above-cloud and adjacent clear-sky retrievals than AOT or SSA.

8. Line 245: "The high AE and low SSA is an expected feature of the smoke in mid-Africa." Can you please provide a reference for this statement?

9. Lines 245-247: Such a large disagreement in AE suggests some fundamental or systematic difference between the method considered here vs. that considered by Waquet et al. (2013), and it would be good to understand the origin of this difference. Is there a reasonable explanation why the reported AEs are less than 1/2 that reported by Waquet et al. in these midlatitudes? Does Waquet et al. also only consider above-liquid-cloud AE? If not, given that above-ice-cloud AE is disregarded here, is that perhaps a common occurrence and the results of Waquet et al. are elevated due to that above-ice-cloud AE? Does the Waquet et al. approach overestimate ACAOT, or does the RemoTAP method considered here underestimate AE? Did Waquet et al. similarly consider the differences between above-cloud AE with adjacent clear-sky AE? Hopefully these questions can help to better elucidate the origin of this stark difference.

10. Lines 255-257: I don't agree that the synthetic experiments indicate the NNs have the ability to retrieve AE from fine- and dust-mode-dominated aerosol. Figure 3 shows that the performance is poor in these regimes: the NN underestimates the AE for fine-dominated cases, and it overestimates the AE for dust-dominated cases.

11. Lines 273-275: "... the NN-based surrogate forward model, just like the full-physical model, can provide goodness-of-fit mask to filter unphysical retrievals, which may due to imperfect cloud mask or some challenging aerosol/cloud/surface combination.":

a) I don't see where this is substantiated in the manuscript; no comparisons are performed between the NN's goodness-of-fit calculation and a physics-based model's corresponding goodness-of-fit metric. Please perform the test in comment #5b to substantiate this claim.
b) The last clause of this sentence is phrased awkwardly, and I cannot discern the intended meaning in the context of the rest of the sentence. Please rephrase this so that it clearly conveys the intended meaning.

Citation: https://doi.org/10.5194/egusphere-2025-1488-RC1
- AC1: 'Reply on RC1', Zihao Yuan, 25 Jul 2025
  
  We would like to thank the reviewer for his/her important comments, and we upload the response here in the supplement.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1488-AC1
RC2:
'Comment on egusphere-2025-1488', Anonymous Referee #2, 09 Jun 2025

see my pdf.

Citation: https://doi.org/10.5194/egusphere-2025-1488-RC2
- AC2: 'Reply on RC2', Zihao Yuan, 25 Jul 2025
  
  We would like to thank the reviewer for his/her important comments, and we upload the response here in the supplement.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1488-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Zihao Yuan on behalf of the Authors (25 Jul 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (04 Aug 2025) by Omar Torres

ED: Publish subject to minor revisions (review by editor) (20 Aug 2025) by Omar Torres

AR by Zihao Yuan on behalf of the Authors (21 Aug 2025) Author's response Author's tracked changes Manuscript

ED: Publish as is (21 Aug 2025) by Omar Torres

AR by Zihao Yuan on behalf of the Authors (22 Aug 2025)

Journal article(s) based on this preprint

16 Oct 2025

Above Cloud Aerosol Detection and Retrieval from Multi-Angular Polarimetric Satellite Measurements in a Neural Network Ensemble Approach

Zihao Yuan, Guangliang Fu, Hai Xiang Lin, Jan Willem Erisman, and Otto P. Hasekamp

Atmos. Meas. Tech., 18, 5415–5434, https://doi.org/10.5194/amt-18-5415-2025,https://doi.org/10.5194/amt-18-5415-2025, 2025

Short summary

Zihao Yuan, Guangliang Fu, Hai Xiang Lin, Jan Willem Erisman, and Otto P. Hasekamp

Viewed

Total article views: 900 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
815	71	14	900	14	33

HTML: 815
PDF: 71
XML: 14
Total: 900
BibTeX: 14
EndNote: 33

Views and downloads (calculated since 08 Apr 2025)

Month	HTML	PDF	XML	Total
Apr 2025	159	20	4	183
May 2025	63	13	2	78
Jun 2025	45	6	5	56
Jul 2025	26	5	0	31
Aug 2025	99	7	0	106
Sep 2025	408	19	3	430
Oct 2025	15	1	0	16

Cumulative views and downloads (calculated since 08 Apr 2025)

Month	HTML	PDF	XML	Total
Apr 2025	159	20	4	183
May 2025	63	13	2	78
Jun 2025	45	6	5	56
Jul 2025	26	5	0	31
Aug 2025	99	7	0	106
Sep 2025	408	19	3	430
Oct 2025	15	1	0	16

Viewed (geographical distribution)

Total article views: 852 (including HTML, PDF, and XML) Thereof 852 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 16 Oct 2025

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (9215 KB)
Metadata XML

Short summary

This work develops an Neural-Network-based above cloud aerosol (ACA) detection and retrieval scheme for multi-angular polarimetric (MAP) instruments. On one year of PARASOL data, the retrieved aerosol properties (aerosol optical thickness, AOT, Angstrom Exponent, AE, and Single Scattering Albedo, SSA) agree well with adjacent clear-sky RemoTAP-PARASOL aerosol retrievals. The seasonal global pattern of ACA events and above cloud AOT are also within expectation.


Total:	0
HTML:	0
PDF:	0
XML:	0