Correcting Aerosol Extinction Coefficient Vertical Structure Biases in GEOS-Chem via a Physics-Informed Transformer with Physical Mechanism Diagnosis

Xiong, Jiajun; Wang, Yi; Wang, Jun; Wang, Yanyu; Zhou, Meng; Tao, Minghui; Dong, Wenhui; Kim, Jhoon; Wang, Lunche

doi:10.5194/egusphere-2026-397

Preprints

https://doi.org/10.5194/egusphere-2026-397

Preprints

17 Feb 2026

| 17 Feb 2026

Correcting Aerosol Extinction Coefficient Vertical Structure Biases in GEOS-Chem via a Physics-Informed Transformer with Physical Mechanism Diagnosis

Jiajun Xiong, Yi Wang, Jun Wang, Yanyu Wang, Meng Zhou, Minghui Tao, Wenhui Dong, Jhoon Kim, and Lunche Wang

Abstract. We propose a physics-informed Transformer framework to correct biases in the Aerosol Extinction Coefficient (AEC, km^-1) profiles simulated by GEOS-Chem. Unlike standard Transformer, our framework features a dual-stream architecture with explicit physical constraints. It employs Gated Feature Fusion to integrate vertical structures (combining GEOS-Chem priors with MERRA-2 profiles) by dynamically identifying height-dependent drivers, and leverages Cross-Attention to incorporate MERRA-2 surface environmental constraints for modulating AEC vertical reconstruction with synoptic contexts. This approach effectively predicts systematic biases relative to Cloud-Aerosol Lidar with Orthogonal Polarization satellite observations and resolves AEC profiles, surpassing methods retrieving only aerosol layer heights. "Leave-One-Year-Out" validation over East Asia during 2017–2019 demonstrates significant AEC fidelity improvements, increasing R from 0.49–0.53 in the GEOS-Chem simulations to 0.66–0.73 and reducing RMSE by approximately 25 %. The model effectively mitigates over-diffusion, significantly reducing AEC simulation biases in the critical near-surface layer while restoring smoothed biomass burning and dust plumes. Additionally, it exhibits robust cross-continental transferability, reproducing bias patterns over North American domain (R=0.70) without retraining, confirming the internalization of universal physicochemical relationships linking atmospheric states to simulation biases. Furthermore, interpretability analysis establishes a feedback loop from data-driven correction to physical model improvement. The model identifies temperature and sensible heat flux as primary drivers to constrain boundary layer mixing, and uses environmental proxies (e.g., vegetation indices) to diagnose deficiencies in dust uplift and secondary aerosol formation. These insights provide a physical basis for refining parameterization schemes in chemical transport models.

Received: 24 Jan 2026 – Discussion started: 17 Feb 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 7550 KB)

Supplement (7441 KB)

Download & links

Jiajun Xiong, Yi Wang, Jun Wang, Yanyu Wang, Meng Zhou, Minghui Tao, Wenhui Dong, Jhoon Kim, and Lunche Wang

Status: closed

RC1:
'Comment on egusphere-2026-397', Anonymous Referee #1, 09 Mar 2026

This manuscript presents a sophisticated physics-informed Transformer framework to correct GEOS-Chem aerosol extinction coefficient profiles using CALIOP observations. The study is ambitious, methodologically advanced, and addresses an important problem in bridging chemical transport models (CTMs) and vertically resolved lidar observations. The reported improvements in correlation and RMSE, along with cross-continental transferability tests, are promising. However, several issues require clarification before the scientific contribution and methodological advantage can be properly evaluated as follows.
First, the scientific objective requires clearer framing. CALIOP observations are used to define simulation bias during training, but they are not included as inputs during inference. Therefore, the framework is not performing data assimilation, but rather learning a state-dependent mapping between atmospheric variables and historical GEOS-Chem biases. If the goal is to generate corrected AEC fields when CALIOP is unavailable, the method should be clearly described as a supervised bias-correction model conditioned on CTM state and meteorology, and its limitations should be acknowledged. For example, if key emissions (e.g., wildfire events) are missing in GEOS-Chem and not represented in the input features, the model cannot reconstruct those missing signals. The correction is inherently constrained by the information content of the CTM and meteorological predictors. The manuscript should therefore distinguish more carefully between correcting systematic state-dependent biases and compensating for missing physical processes. Clarifying this distinction would strengthen the scientific positioning of the study.
Second, the model architecture appears to rely on instantaneous vertical profiles and meteorological context, without explicit time-series modeling. It is unclear whether any temporal continuity, lagged predictors, or time-window averaging is incorporated into the inputs. A precise description of the temporal collocation strategy between GEOS-Chem and CALIOP is necessary to assess the robustness of the results. In addition, the manuscript does not discuss how diurnal variability in aerosol vertical structure is handled. Given the strong diurnal cycle of boundary layer evolution, turbulent mixing, hygroscopic growth, and photochemistry, aerosol extinction can vary substantially on hourly timescales. It should be clarified whether simple hour-by-hour matching is sufficient, or whether a temporal window similar to those used in traditional data assimilation frameworks, was considered to reduce representativeness errors. Without such analysis, it remains uncertain whether the reported improvements reflect stable bias correction or sensitivity to sampling timing and diurnal variability.
Third, the proposed architecture includes multiple advanced components. While the performance improvements are reported relative to the original GEOS-Chem simulation, there is no comparison with simpler machine learning baselines. It is therefore unclear whether the reported gains arise from the Transformer architecture itself, from the inclusion of additional meteorological predictors, or simply from the supervised bias-learning framework. To justify the methodological novelty, the study should include comparisons with at least one conventional model, such as a multilayer perceptron, a CNN-based model, or a tree-based regression approach. Ideally, ablation experiments isolating the contributions of the cross-attention module and gated fusion mechanism would further demonstrate the necessity of the proposed architecture. Without such benchmarks, it is difficult to assess whether the architectural complexity is warranted.

Citation: https://doi.org/10.5194/egusphere-2026-397-RC1
- AC1: 'Reply on RC1', Yi Wang, 30 Apr 2026
  
  Dear Reviewer,
  Thank you for your constructive and insightful comments, which have significantly helped us strengthen the scientific positioning and methodological clarity of our study. We have carefully addressed your concerns regarding the temporal collocation strategy, the handling of diurnal variability, and the necessity of the proposed Transformer architecture through additional benchmarking experiments.
  Detailed, point-by-point responses to your comments are provided in the Supplementary document. Black color represents the review comments, blue color represents the reply comments, and green color represents the revised contents of the manuscript and supplement. We deeply appreciate your time and effort in reviewing our manuscript.
  
  Citation: https://doi.org/10.5194/egusphere-2026-397-AC1
RC2:
'Comment on egusphere-2026-397', Anonymous Referee #2, 11 Apr 2026

This study tried to minimize the biases in GEOS-Chem aerosol simulation vertical structure using CALIPSO data. It meets the need for reconstructing aerosols’ spatially continuous distributions with high-accuracy vertical profiles. Methodologically, the paper proposes a Physics-Informed Transformer framework, explicitly incorporating physical priors through dual-stream inputs, gated feature fusion, and cross-attention mechanisms, thereby overcoming the limitations of traditional CNNs in capturing vertical dependencies of aerosols. Several issues need to be carefully addressed in the manuscript.
(1) The manuscript is too long to read. Please try to reduce redundant text. In particular, the methodology part contains many technical terms that make it extremely difficult to follow. Figure 2 shows the technical framework. However, I do not understand anything except the input layer when looking at this figure. Please add more details in the figure to show the physical meaning of feature embedding layer, Transformer Encode, and Cross Attention Layer. The methodology part needs to re-write in a way that atmospheric chemists and physicist can understand.
(2) In Section 3.1 (Eq. 1), the learning target is defined as the bias of GEOS-Chem relative to CALIOP. However, Section 2.2 states that CALIOP AOD shows a mean relative bias of −5.1% ± 8.5% against AERONET, and CALIOP backscatter agrees with HSRL within 1.0% ± 3.5%. These results indicate that CALIOP itself contains systematic uncertainties. Consequently, the learned “bias” effectively represents a combination of GEOS-Chem error and CALIOP error. If CALIOP has a negative bias, the model may incorrectly learn a tendency to increase AEC, even in cases where GEOS-Chem is accurate. This issue directly affects the interpretation and reliability of the bias-correction results. Suggestions: (a) Explicitly acknowledge this limitation in Section 2.2 or 3.1, and discuss the potential impact of CALIOP uncertainty on the training target. (b) Add a sensitivity analysis in the Results section (Section 4): quantify how the bias-correction results change if perturbations are applied to the CALIOP inputs.
(3) The manuscript states that the interpretability analysis can provide a solid physical basis for improving GEOS-Chem parameterizations and emission inventories, thereby establishing a feedback loop from “data-driven correction” to “physical mechanism improvement.” However, in Section 3.5, the interpretability framework is limited to feature-sensitivity approaches such as gradient attribution, permutation importance, and SHAP, without explaining how these results can be translated into concrete parameterization adjustments. For example, if SHAP identifies “sensible heat flux” as a dominant driver of the bias, it is unclear which specific GEOS-Chem parameters should be modified (e.g., diffusion coefficients in the PBL scheme, surface flux parameterizations, or others), and how such modifications would be implemented. This missing link weakens the claimed feedback loop and makes the statement appear largely conceptual rather than actionable.
(4) The Introduction appears to be over-cited, which makes it difficult for readers to clearly distinguish foundational studies from more recent developments. It would improve readability and focus to streamline the citations, limiting each statement to approximately three to five representative and/or recent review or key references.
(5) Section 3.5 is divided into 3.5.1 (Dual-Mechanism Attribution), 3.5.2 (Gated Fusion Analysis), and 3.5.3 (Feature Sensitivity and Regional Drivers). However, the Permutation Feature Importance and SHAP analysis in 3.5.3 overlap functionally with the Gradient-based Attribution in 3.5.1—both are essentially feature importance assessments. It is recommended to clearly articulate the complementarity of these three attribution methods: Gradient-based Attribution captures local sensitivity, Permutation Feature Importance provides global ranking, and SHAP analysis handles feature interactions and regional heterogeneity.
(6) Line 198: Explain the specific collocation strategy. How are the two datasets matched in space and time? What level of representativeness error might be introduced by this collocation approach?
(7) Line 746-748: The lower transfer performance over North America (R = 0.70) compared to East Asia (R = 0.93) is attributed to a shift in aerosol composition regimes (higher SOA fraction in North America versus sulfate–nitrate–dust dominance in East Asia). While this explanation is reasonable, it remains qualitative and lacks supporting evidence. It would be helpful to further evaluate the performance over North America stratified by CALIOP aerosol types.
(8) Abstract: too technical. Suggest to add several sentences in the beginning to introduce the science context and research gap before jumping into technical details.
(9) Figure 3. R between model prediction and what data? What are the units for RMSE and Bias?
(10) Figure 5. Why India shows much negative results?
(11) Figure 6. I would say that the correlation even after correction is not that good. Can you explain where are those points that are far away from the 1:1 line?
(12) Figure 10. Font size too small.

Citation: https://doi.org/10.5194/egusphere-2026-397-RC2
- AC2:
  'Reply on RC2', Yi Wang, 30 Apr 2026
  Dear Reviewer,
  Thank you for your rigorous evaluation and valuable suggestions. In the revised manuscript, we have comprehensively restructured the methodology section to translate the deep learning architecture into explicit atmospheric physical processes (e.g., mapping the self-attention mechanism to vertical turbulent mixing and large-scale advection). To directly address your core concerns, we have conducted several new quantitative analyses:
  We performed a perturbation-based sensitivity experiment to explicitly quantify the impact of CALIOP observational uncertainties on the correction target.
  
  We evaluated the model’s spatial transferability over North America by stratifying the data into specific aerosol subtypes (dust-dominated vs. SOA-dominated regimes), providing a solid physical basis for the domain shift.
  
  We expanded the discussion to explicitly bridge the data-driven feature sensitivities (e.g., sensible heat flux and diffuse radiation) with targeted diagnostic refinements in GEOS-Chem's specific parameterization schemes.
  
  Detailed, point-by-point responses to all your comments are provided in the Supplementary document. Black color represents your original comments, blue color indicates our replies, and green color highlights the revised text in the manuscript and supplement. We deeply appreciate the time and effort you have dedicated to reviewing our work.
  
  Citation: https://doi.org/10.5194/egusphere-2026-397-AC2

Status: closed

RC1:
'Comment on egusphere-2026-397', Anonymous Referee #1, 09 Mar 2026

This manuscript presents a sophisticated physics-informed Transformer framework to correct GEOS-Chem aerosol extinction coefficient profiles using CALIOP observations. The study is ambitious, methodologically advanced, and addresses an important problem in bridging chemical transport models (CTMs) and vertically resolved lidar observations. The reported improvements in correlation and RMSE, along with cross-continental transferability tests, are promising. However, several issues require clarification before the scientific contribution and methodological advantage can be properly evaluated as follows.
First, the scientific objective requires clearer framing. CALIOP observations are used to define simulation bias during training, but they are not included as inputs during inference. Therefore, the framework is not performing data assimilation, but rather learning a state-dependent mapping between atmospheric variables and historical GEOS-Chem biases. If the goal is to generate corrected AEC fields when CALIOP is unavailable, the method should be clearly described as a supervised bias-correction model conditioned on CTM state and meteorology, and its limitations should be acknowledged. For example, if key emissions (e.g., wildfire events) are missing in GEOS-Chem and not represented in the input features, the model cannot reconstruct those missing signals. The correction is inherently constrained by the information content of the CTM and meteorological predictors. The manuscript should therefore distinguish more carefully between correcting systematic state-dependent biases and compensating for missing physical processes. Clarifying this distinction would strengthen the scientific positioning of the study.
Second, the model architecture appears to rely on instantaneous vertical profiles and meteorological context, without explicit time-series modeling. It is unclear whether any temporal continuity, lagged predictors, or time-window averaging is incorporated into the inputs. A precise description of the temporal collocation strategy between GEOS-Chem and CALIOP is necessary to assess the robustness of the results. In addition, the manuscript does not discuss how diurnal variability in aerosol vertical structure is handled. Given the strong diurnal cycle of boundary layer evolution, turbulent mixing, hygroscopic growth, and photochemistry, aerosol extinction can vary substantially on hourly timescales. It should be clarified whether simple hour-by-hour matching is sufficient, or whether a temporal window similar to those used in traditional data assimilation frameworks, was considered to reduce representativeness errors. Without such analysis, it remains uncertain whether the reported improvements reflect stable bias correction or sensitivity to sampling timing and diurnal variability.
Third, the proposed architecture includes multiple advanced components. While the performance improvements are reported relative to the original GEOS-Chem simulation, there is no comparison with simpler machine learning baselines. It is therefore unclear whether the reported gains arise from the Transformer architecture itself, from the inclusion of additional meteorological predictors, or simply from the supervised bias-learning framework. To justify the methodological novelty, the study should include comparisons with at least one conventional model, such as a multilayer perceptron, a CNN-based model, or a tree-based regression approach. Ideally, ablation experiments isolating the contributions of the cross-attention module and gated fusion mechanism would further demonstrate the necessity of the proposed architecture. Without such benchmarks, it is difficult to assess whether the architectural complexity is warranted.

Citation: https://doi.org/10.5194/egusphere-2026-397-RC1
- AC1: 'Reply on RC1', Yi Wang, 30 Apr 2026
  
  Dear Reviewer,
  Thank you for your constructive and insightful comments, which have significantly helped us strengthen the scientific positioning and methodological clarity of our study. We have carefully addressed your concerns regarding the temporal collocation strategy, the handling of diurnal variability, and the necessity of the proposed Transformer architecture through additional benchmarking experiments.
  Detailed, point-by-point responses to your comments are provided in the Supplementary document. Black color represents the review comments, blue color represents the reply comments, and green color represents the revised contents of the manuscript and supplement. We deeply appreciate your time and effort in reviewing our manuscript.
  
  Citation: https://doi.org/10.5194/egusphere-2026-397-AC1
RC2:
'Comment on egusphere-2026-397', Anonymous Referee #2, 11 Apr 2026

This study tried to minimize the biases in GEOS-Chem aerosol simulation vertical structure using CALIPSO data. It meets the need for reconstructing aerosols’ spatially continuous distributions with high-accuracy vertical profiles. Methodologically, the paper proposes a Physics-Informed Transformer framework, explicitly incorporating physical priors through dual-stream inputs, gated feature fusion, and cross-attention mechanisms, thereby overcoming the limitations of traditional CNNs in capturing vertical dependencies of aerosols. Several issues need to be carefully addressed in the manuscript.
(1) The manuscript is too long to read. Please try to reduce redundant text. In particular, the methodology part contains many technical terms that make it extremely difficult to follow. Figure 2 shows the technical framework. However, I do not understand anything except the input layer when looking at this figure. Please add more details in the figure to show the physical meaning of feature embedding layer, Transformer Encode, and Cross Attention Layer. The methodology part needs to re-write in a way that atmospheric chemists and physicist can understand.
(2) In Section 3.1 (Eq. 1), the learning target is defined as the bias of GEOS-Chem relative to CALIOP. However, Section 2.2 states that CALIOP AOD shows a mean relative bias of −5.1% ± 8.5% against AERONET, and CALIOP backscatter agrees with HSRL within 1.0% ± 3.5%. These results indicate that CALIOP itself contains systematic uncertainties. Consequently, the learned “bias” effectively represents a combination of GEOS-Chem error and CALIOP error. If CALIOP has a negative bias, the model may incorrectly learn a tendency to increase AEC, even in cases where GEOS-Chem is accurate. This issue directly affects the interpretation and reliability of the bias-correction results. Suggestions: (a) Explicitly acknowledge this limitation in Section 2.2 or 3.1, and discuss the potential impact of CALIOP uncertainty on the training target. (b) Add a sensitivity analysis in the Results section (Section 4): quantify how the bias-correction results change if perturbations are applied to the CALIOP inputs.
(3) The manuscript states that the interpretability analysis can provide a solid physical basis for improving GEOS-Chem parameterizations and emission inventories, thereby establishing a feedback loop from “data-driven correction” to “physical mechanism improvement.” However, in Section 3.5, the interpretability framework is limited to feature-sensitivity approaches such as gradient attribution, permutation importance, and SHAP, without explaining how these results can be translated into concrete parameterization adjustments. For example, if SHAP identifies “sensible heat flux” as a dominant driver of the bias, it is unclear which specific GEOS-Chem parameters should be modified (e.g., diffusion coefficients in the PBL scheme, surface flux parameterizations, or others), and how such modifications would be implemented. This missing link weakens the claimed feedback loop and makes the statement appear largely conceptual rather than actionable.
(4) The Introduction appears to be over-cited, which makes it difficult for readers to clearly distinguish foundational studies from more recent developments. It would improve readability and focus to streamline the citations, limiting each statement to approximately three to five representative and/or recent review or key references.
(5) Section 3.5 is divided into 3.5.1 (Dual-Mechanism Attribution), 3.5.2 (Gated Fusion Analysis), and 3.5.3 (Feature Sensitivity and Regional Drivers). However, the Permutation Feature Importance and SHAP analysis in 3.5.3 overlap functionally with the Gradient-based Attribution in 3.5.1—both are essentially feature importance assessments. It is recommended to clearly articulate the complementarity of these three attribution methods: Gradient-based Attribution captures local sensitivity, Permutation Feature Importance provides global ranking, and SHAP analysis handles feature interactions and regional heterogeneity.
(6) Line 198: Explain the specific collocation strategy. How are the two datasets matched in space and time? What level of representativeness error might be introduced by this collocation approach?
(7) Line 746-748: The lower transfer performance over North America (R = 0.70) compared to East Asia (R = 0.93) is attributed to a shift in aerosol composition regimes (higher SOA fraction in North America versus sulfate–nitrate–dust dominance in East Asia). While this explanation is reasonable, it remains qualitative and lacks supporting evidence. It would be helpful to further evaluate the performance over North America stratified by CALIOP aerosol types.
(8) Abstract: too technical. Suggest to add several sentences in the beginning to introduce the science context and research gap before jumping into technical details.
(9) Figure 3. R between model prediction and what data? What are the units for RMSE and Bias?
(10) Figure 5. Why India shows much negative results?
(11) Figure 6. I would say that the correlation even after correction is not that good. Can you explain where are those points that are far away from the 1:1 line?
(12) Figure 10. Font size too small.

Citation: https://doi.org/10.5194/egusphere-2026-397-RC2
- AC2:
  'Reply on RC2', Yi Wang, 30 Apr 2026
  Dear Reviewer,
  Thank you for your rigorous evaluation and valuable suggestions. In the revised manuscript, we have comprehensively restructured the methodology section to translate the deep learning architecture into explicit atmospheric physical processes (e.g., mapping the self-attention mechanism to vertical turbulent mixing and large-scale advection). To directly address your core concerns, we have conducted several new quantitative analyses:
  We performed a perturbation-based sensitivity experiment to explicitly quantify the impact of CALIOP observational uncertainties on the correction target.
  
  We evaluated the model’s spatial transferability over North America by stratifying the data into specific aerosol subtypes (dust-dominated vs. SOA-dominated regimes), providing a solid physical basis for the domain shift.
  
  We expanded the discussion to explicitly bridge the data-driven feature sensitivities (e.g., sensible heat flux and diffuse radiation) with targeted diagnostic refinements in GEOS-Chem's specific parameterization schemes.
  
  Detailed, point-by-point responses to all your comments are provided in the Supplementary document. Black color represents your original comments, blue color indicates our replies, and green color highlights the revised text in the manuscript and supplement. We deeply appreciate the time and effort you have dedicated to reviewing our work.
  
  Citation: https://doi.org/10.5194/egusphere-2026-397-AC2

Jiajun Xiong, Yi Wang, Jun Wang, Yanyu Wang, Meng Zhou, Minghui Tao, Wenhui Dong, Jhoon Kim, and Lunche Wang

Supplement

https://doi.org/10.5194/egusphere-2026-397-supplement

Jiajun Xiong, Yi Wang, Jun Wang, Yanyu Wang, Meng Zhou, Minghui Tao, Wenhui Dong, Jhoon Kim, and Lunche Wang

Viewed

Total article views: 1,822 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
941	809	72	1,822	359	53	144

HTML: 941
PDF: 809
XML: 72
Total: 1,822
Supplement: 359
BibTeX: 53
EndNote: 144

Views and downloads (calculated since 17 Feb 2026)

Month	HTML	PDF	XML	Total
Feb 2026	345	240	27	612
Mar 2026	430	409	39	878
Apr 2026	102	119	2	223
May 2026	64	41	4	109

Cumulative views and downloads (calculated since 17 Feb 2026)

Month	HTML	PDF	XML	Total
Feb 2026	345	240	27	612
Mar 2026	430	409	39	878
Apr 2026	102	119	2	223
May 2026	64	41	4	109

Viewed (geographical distribution)

Total article views: 1,839 (including HTML, PDF, and XML) Thereof 1,839 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 31 May 2026

Short summary

Current models struggle to simulate aerosol extinction profiles accurately. We introduce a physics-informed deep learning framework combining model simulations with satellite data to reconstruct precise three-dimensional aerosol fields. This method significantly reduces biases over East Asia and works effectively in North America without retraining. Crucially, it acts as a diagnostic tool to identify specific physical flaws in models, guiding improvements for climate research.


Total:	0
HTML:	0
PDF:	0
XML:	0