the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Machine Learning Model for Inverting Convective Boundary Layer Height with Implicit Physical Constraints and Its Multi-Site Applicability
Abstract. Accurate estimation of convective boundary layer height (CBLH) is vital for weather, climate, and air quality modeling. Machine learning (ML) shows promise in CBLH prediction, but input parameter selection often lacks physical grounding, limiting generalizability. This study introduces a novel ML framework for CBLH inversion, integrating thermodynamic constraints and the diurnal CBLH cycle as an implicit physical guide. Boundary layer growth is modeled as driven by surface heat fluxes and atmospheric heat absorption, using the diurnal cycle as input and output. TPOT and AutoKeras are employed to select optimal models, validated against Doppler lidar-derived CBLH data, achieving an R2 of 0.84 across untrained years. Comparisons of eddy covariance (ECOR) and energy balance Bowen ratio (EBBR) flux measurements show consistent predictions (R2 difference ~0.011, MAE ~0.002 km). Models trained on C1 site ECOR data and tested at E37 and E39 yield R2 values of 0.787 and 0.806, respectively, demonstrating adaptability. Training with all sites’ data enhances C1 ECOR and EBBR performance over C1-only training: ECOR (R2: 0.851 vs. 0.845; MAE: 0.198 km vs. 0.207 km), EBBR (R2: 0.837 vs. 0.834; MAE: 0.203 km vs. 0.205 km). Transferability across ARM Southern Great Plains sites and seasonal performance during summer confirm the model’s robustness, offering a scalable approach for improving boundary layer parameterization in atmospheric models.
- Preprint
(5172 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2025-2490', Anonymous Referee #2, 21 Aug 2025
-
RC2: 'Comment on egusphere-2025-2490', Anonymous Referee #1, 08 Sep 2025
Review of the article titled “Machine Learning model for inverting convective boundary layer height with implicit physical constraints and its multi-site applicability” by Chu and coauthors for publication in Atmos. Chem. Phys.
The authors have used boundary layer (BL) height from the doppler lidar (DL), surface fluxes from eddy correlation (ECOR) and energy balance Bowen Ratio (EBBR), and thermodynamic stability from Atmospheric Emitted Radiance Interferometer (AERI) to construct a machine learning (ML) model for predicting BL height. The data from ARM SGP site, and other ancillary sites around SGP have been used. The main premise of the paper is using the off-the-shelf interfaces like TPOT and AutoKeras for training and validation, thereby leaving AI to pick the ML model. After model identification, the authors have applied the model to predict BL height over different seasons and different sites. The article is relatively well-written and fits the journals scope. However, I believe that the article lacks physical depth, and could be improved. Much of the discussion is on simply adapting the data for TPOT and AutoKeras, which is not novel. The paper is also too long at this point, some of the discussion is more suitable for a dissertation rather than a paper. So, mentioning few things below that can improve the article further.
I like that you are trying to use some physical constraints as input parameters to improve the ML model. LTS, time, and sun parameters are a good start (Line 345). However, through previous research it has been shown that presence of elevated humidity above the BL can also affect the BL development through radiative effects, and same is true for high-level clouds. These effects in some part will be reflected in the ECOR fluxes, but with a time delay. In addition, wind speed, wind direction, wind shear and wind veer have also been shown to be very important. So maybe you can include the following parameters in your input models, as they are also available at the ARM sites: wind speed, wind direction, wind shear, wind veer, surface upwelling and downwelling longwave and shortwave radiation. Surface meteorological variables will also be good to include. I understand that including cloud properties might be hard, but given the strong expertise of authors Deng, Xue and Wang, they can include ceilometer cloud fraction and base height in it. This might significantly improve the model, and the shapely analysis will tell which parameters are important. Thank you.
The authors have used AutoKeras and TPOT for selecting the best model, which is great. Given the small amount of data used in this work, the tree-based models could also be employed in TPOT. It will be good if you can tell us what model and the associated hyperparameters was picked by these two frameworks. I cannot tell if they training was done online, and if so the batch sizes etc. The hyperparameters then should be scrutinized to understand whether any more improvement can be made. On the same topic, it will be good if the authors can mention if they explicitly or the two frameworks implicitly regularized (normalized, bias correct etc.) the input parameters. I assume the (Line 345) sunrise and sunset times and time variable could be normalized by 24.
Figure 4 onwards: these are nice figures, but maybe you can add another panel showing the time evolution of the difference between DL CBLH and predicted CBLH. This will truly tell if the model accurately captures the daytime evolution.
Figure 10: The connection between convective boundary layer and surface fluxes is clear during summer months, but I cannot tell how it works in winter months. Can you please elaborate on the number of samples going into this figure, especially for the colder seasons. It is difficult to assess as to how accurate the surface fluxes might be when the environment is cold and the surface is frozen. Otherwise, you can control for it by using temperature and advection in your input parameters. Thank you.
Citation: https://doi.org/10.5194/egusphere-2025-2490-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
916 | 52 | 16 | 984 | 13 | 14 |
- HTML: 916
- PDF: 52
- XML: 16
- Total: 984
- BibTeX: 13
- EndNote: 14
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
The preprint describes a machine-learning model (“Auto-ML”) trained to diagnose the convective boundary layer height (CBLH) evolution over one day. Generally, I think the choices described to add physical grounding to the ML model are well-motivated, though the paper description of them as providing ‘implicit physical constraints’ may be a bit of a reach. The paper would be much stronger if it included a baseline method of CBLH prediction; without one it lacks context for judging the Auto-ML skill.
Specific comments:
Other comments: