the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Analysis of cloud fraction adjustment to aerosols and its dependence on meteorological controls using explainable machine learning
Abstract. Aerosol-cloud interactions (ACI) have a pronounced influence on the Earth’s radiation budget but continue to pose one of the most substantial uncertainties in the climate system. Marine boundary-layer clouds (MBLCs) are particularly important since they cover a large portion of the Earth’s surface. One of the biggest challenges in quantifying ACI from observations lies in isolating adjustments of cloud fraction (CLF) to aerosol perturbations from the covariability and influence of the local meteorological conditions. In this study, this isolation is attempted using nine years (2011–2019) of near-global daily satellite cloud products in combination with reanalysis data of meteorological parameters. With cloud-droplet number concentration (Nd) as a proxy for aerosol, MBLC CLF is predicted by region-specific gradient boosting machine learning models. By means of SHapley Additive exPlanation (SHAP) regression values, CLF sensitivity to Nd and meteorological factors as well as meteorological influences on the Nd–CLF sensitivity are quantified. The regional ML models are able to capture on average 45 % of the CLF variability. Global patterns of CLF sensitivity show that CLF is positively associated with Nd, in particular in the stratocumulus-to-cumulus transition regions and in the Southern Ocean. CLF sensitivity to estimated inversion strength (EIS) is ubiquitously positive and strongest in tropical and subtropical regions topped by stratocumulus and within the midlatitudes. Globally, increased sea surface temperature (SST) reduces CLF, particularly in stratocumulus regions. The spatial patterns of CLF sensitivity to horizontal wind components in the free troposphere point to the impact of synoptic-scale weather systems and vertical wind shear on MBLCs. The Nd–CLF relationship is found to depend more on the selected thermodynamical variables than dynamical variables, and in particular on EIS and SST. In the midlatitudes, a stronger inversion is found to amplify the Nd–CLF relationship, while this is not observed in the stratocumulus regions. In the stratocumulus-to-cumulus transition regions, the Nd–CLF sensitivity is found to be amplified by higher SSTs, potentially pointing to Nd more frequently delaying this transition in these conditions. The expected climatic changes of EIS and SST may thus influence future forcings from ACIs. The near-global ML framework introduced in this study produces a better quantification of the response of MBLC CLF to aerosols taking into account the covariations with meteorology.
- Preprint
(2152 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on egusphere-2023-1667', Anonymous Referee #1, 29 Aug 2023
This is an interesting work focused on factors influencing the cloud fraction by using explainable machine learning approaches. The data and method are both solid, and the paper is well-written. I just found several places that need to be justified or clarified. Therefore, I recommend a minor revision for this paper to be published on ACP.
Line 90: Does re means CLF? How cloud temperature, solar zenith viewing angle, and satellite zenith angle are used to filter and compute Nd?
Line 110: How the reanalysis data are harmonized? Could you please provide a little more details, which spatial averaging (interpolation) techniques have been used?
Line 114: Why use 99 percentiles as the threshold? How extreme values will influence your interpretation of the ML model?
Line 136: 6000 data points as the threshold, would you please provide an estimate for the percentage of data dropped from the total sample?
Line 180: The definition of IAI may need to be clarified, is it a slope or difference? how the difference is calculated, which minus which?
Line 214: How do you explain that the PF is influencing the CLF but not vice versa? Same for some other predictors in the ML model.
Line 229, SHF, please declare the full name of the acronym when it first appears in the text, even if it has been listed in table 1.
Section 3.3.2, Fig. 8, why RH850 is omitted? It shows higher importance than SST in Fig.7.Citation: https://doi.org/10.5194/egusphere-2023-1667-RC1 -
RC2: 'Comment on egusphere-2023-1667', Anonymous Referee #2, 30 Sep 2023
This manuscript fits xgboost models to capture daily cloud fraction at 5x5 degree scale. They apply the SHAP calculation and present results mostly in terms of the SHAP values and their variations to different values by holding individual variables out from model calculations. The authors argue that the results represent a 'better quantification of the responses of MBLC CLF to aerosols'. The topic is relevant for ACP. However, there are a few major methodological concerns I have and they are detailed in the following. In my opinion, they must be addressed before the paper can be published.
1. It is unclear what data constitute MBL CLF. Only daily MODIS level-3 data are mentioned, but AFAIK, MODIS daily data do not have a field called MBL CLF. The authors need to be clear on this.
2. It is unclear how the data is standardized. The standardization procedure is easy to understand, but is this done globally or locally for each 5x5 box?
3. Some inappropriate reference citing is noted. For example, line 37, 'Furthermore, MBLCs are especially susceptible to aerosol perturbations due to their physical properties (Wood et al., 2015).' This is a very vague and general statement with a field campaign overview paper as a reference. As a researcher working on this topic for a long time, I cannot follow what is being said or cited here. Line 45: these two references are relevant, but they are neither the earliest nor the best papers that explain the mechanism mentioned here. Line 50: Yuan et al., 2011 showed increase of cloud fraction with aerosols in the trade cumulus region. Line 52: observational evidence of GCM overestimation of LWP adjustment was presented convincingly in Toll et al., 2019. Line 102: potential retrieval biases are extensively discussed in Zhang et al., 2011. Grosvenor et al 2018 has relevant info, but it is not specifically for this subject.
4. SHAP values, as the authors corrected noted, are only ONE way of attempting to explain the boosted tree models. For each data point, there is a SHAP value for each explaining variable. By construct, they are 'situationally' dependent. They don't really provide any physical insights. All the algorithm is trying to do is gradient boosting its model to best fit the data. In fact, the first figure shows that the way the authors try to use SHAP values to 'explain' results is not physical. Figure 1a shows that SHAP value for Nd generally gets larger with increasing Nd. Physically, it says that when clouds are more polluted, cloud fraction tend to increase with Nd more stronger. This runs against our physical understanding. Fitting a slope for the SHAP values and claiming this shows sensitivity of CLF to Nd are not valid IMO. I'd love to hear the authors' rationale here.
5. What follows in the manuscript is thus questionable. I will reserve my comments for the next version after the authors address the important methodological question.
Citation: https://doi.org/10.5194/egusphere-2023-1667-RC2 - AC1: 'Comment on egusphere-2023-1667', Yichen Jia, 27 Oct 2023
Status: closed
-
RC1: 'Comment on egusphere-2023-1667', Anonymous Referee #1, 29 Aug 2023
This is an interesting work focused on factors influencing the cloud fraction by using explainable machine learning approaches. The data and method are both solid, and the paper is well-written. I just found several places that need to be justified or clarified. Therefore, I recommend a minor revision for this paper to be published on ACP.
Line 90: Does re means CLF? How cloud temperature, solar zenith viewing angle, and satellite zenith angle are used to filter and compute Nd?
Line 110: How the reanalysis data are harmonized? Could you please provide a little more details, which spatial averaging (interpolation) techniques have been used?
Line 114: Why use 99 percentiles as the threshold? How extreme values will influence your interpretation of the ML model?
Line 136: 6000 data points as the threshold, would you please provide an estimate for the percentage of data dropped from the total sample?
Line 180: The definition of IAI may need to be clarified, is it a slope or difference? how the difference is calculated, which minus which?
Line 214: How do you explain that the PF is influencing the CLF but not vice versa? Same for some other predictors in the ML model.
Line 229, SHF, please declare the full name of the acronym when it first appears in the text, even if it has been listed in table 1.
Section 3.3.2, Fig. 8, why RH850 is omitted? It shows higher importance than SST in Fig.7.Citation: https://doi.org/10.5194/egusphere-2023-1667-RC1 -
RC2: 'Comment on egusphere-2023-1667', Anonymous Referee #2, 30 Sep 2023
This manuscript fits xgboost models to capture daily cloud fraction at 5x5 degree scale. They apply the SHAP calculation and present results mostly in terms of the SHAP values and their variations to different values by holding individual variables out from model calculations. The authors argue that the results represent a 'better quantification of the responses of MBLC CLF to aerosols'. The topic is relevant for ACP. However, there are a few major methodological concerns I have and they are detailed in the following. In my opinion, they must be addressed before the paper can be published.
1. It is unclear what data constitute MBL CLF. Only daily MODIS level-3 data are mentioned, but AFAIK, MODIS daily data do not have a field called MBL CLF. The authors need to be clear on this.
2. It is unclear how the data is standardized. The standardization procedure is easy to understand, but is this done globally or locally for each 5x5 box?
3. Some inappropriate reference citing is noted. For example, line 37, 'Furthermore, MBLCs are especially susceptible to aerosol perturbations due to their physical properties (Wood et al., 2015).' This is a very vague and general statement with a field campaign overview paper as a reference. As a researcher working on this topic for a long time, I cannot follow what is being said or cited here. Line 45: these two references are relevant, but they are neither the earliest nor the best papers that explain the mechanism mentioned here. Line 50: Yuan et al., 2011 showed increase of cloud fraction with aerosols in the trade cumulus region. Line 52: observational evidence of GCM overestimation of LWP adjustment was presented convincingly in Toll et al., 2019. Line 102: potential retrieval biases are extensively discussed in Zhang et al., 2011. Grosvenor et al 2018 has relevant info, but it is not specifically for this subject.
4. SHAP values, as the authors corrected noted, are only ONE way of attempting to explain the boosted tree models. For each data point, there is a SHAP value for each explaining variable. By construct, they are 'situationally' dependent. They don't really provide any physical insights. All the algorithm is trying to do is gradient boosting its model to best fit the data. In fact, the first figure shows that the way the authors try to use SHAP values to 'explain' results is not physical. Figure 1a shows that SHAP value for Nd generally gets larger with increasing Nd. Physically, it says that when clouds are more polluted, cloud fraction tend to increase with Nd more stronger. This runs against our physical understanding. Fitting a slope for the SHAP values and claiming this shows sensitivity of CLF to Nd are not valid IMO. I'd love to hear the authors' rationale here.
5. What follows in the manuscript is thus questionable. I will reserve my comments for the next version after the authors address the important methodological question.
Citation: https://doi.org/10.5194/egusphere-2023-1667-RC2 - AC1: 'Comment on egusphere-2023-1667', Yichen Jia, 27 Oct 2023
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
609 | 257 | 38 | 904 | 31 | 29 |
- HTML: 609
- PDF: 257
- XML: 38
- Total: 904
- BibTeX: 31
- EndNote: 29
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1