the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Machine Learning Model for Inverting Convective Boundary Layer Height with Implicit Physical Constraints and Its Multi-Site Applicability
Abstract. Accurate estimation of convective boundary layer height (CBLH) is vital for weather, climate, and air quality modeling. Machine learning (ML) shows promise in CBLH prediction, but input parameter selection often lacks physical grounding, limiting generalizability. This study introduces a novel ML framework for CBLH inversion, integrating thermodynamic constraints and the diurnal CBLH cycle as an implicit physical guide. Boundary layer growth is modeled as driven by surface heat fluxes and atmospheric heat absorption, using the diurnal cycle as input and output. TPOT and AutoKeras are employed to select optimal models, validated against Doppler lidar-derived CBLH data, achieving an R2 of 0.84 across untrained years. Comparisons of eddy covariance (ECOR) and energy balance Bowen ratio (EBBR) flux measurements show consistent predictions (R2 difference ~0.011, MAE ~0.002 km). Models trained on C1 site ECOR data and tested at E37 and E39 yield R2 values of 0.787 and 0.806, respectively, demonstrating adaptability. Training with all sites’ data enhances C1 ECOR and EBBR performance over C1-only training: ECOR (R2: 0.851 vs. 0.845; MAE: 0.198 km vs. 0.207 km), EBBR (R2: 0.837 vs. 0.834; MAE: 0.203 km vs. 0.205 km). Transferability across ARM Southern Great Plains sites and seasonal performance during summer confirm the model’s robustness, offering a scalable approach for improving boundary layer parameterization in atmospheric models.
- Preprint
(5172 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 15 Sep 2025)
-
RC1: 'Comment on egusphere-2025-2490', Anonymous Referee #2, 21 Aug 2025
reply
The preprint describes a machine-learning model (“Auto-ML”) trained to diagnose the convective boundary layer height (CBLH) evolution over one day. Generally, I think the choices described to add physical grounding to the ML model are well-motivated, though the paper description of them as providing ‘implicit physical constraints’ may be a bit of a reach. The paper would be much stronger if it included a baseline method of CBLH prediction; without one it lacks context for judging the Auto-ML skill.
Specific comments:
- Simply including LTS and surface fluxes as inputs and using the full day of CBLH as targets does not guarantee that the ML model will learn the correct physical constraints. It is fair to say that these choices introduce more physical grounding into the ML problem setup, but I think that describing these as “Implicit physical constraints” in the title and section 2.5 is too far-reaching.
- My reading of the multisite analyses in 3.3 and 3.6 is that generalizing the model to different sites is limited by the flux input differences and site-specific differences and that training the ML model on the site it is to be used on is needed to achieve the best skill. This seems to contradict the abstract (“transferability across ARM Southern Great Plains sites… confirm the model’s robustness”).
- There is no baseline for comparison to assess how much skill the ML models are adding. I suggest including a simple baseline R2 and MAE as calculated using the training set mean CBLH target (over full time range, and also seasonal for that analysis) and including this baseline on the skill figures and tables. This would add context for how much of an improvement the AutoML model is providing.
- In the description of input and output data in Sec 2.6, I would add a sentence explicitly stating the dimensionality of the input and output data. Related to this point, it sounds like aside from sunrise and sunset times, each input has (n_timestamps_in_day) values in the full input vector. However, later in the interpretability section, single importance scores are given for each input, which confused me. Are SHAP values calculated for each timestamp of an input and averaged together? Please clarify in the text.
- What was the best model out of the set in table 2 chosen by the AutoML? This should be added to the text. Was it one of the two models in section 3.4? Did any other models in Table 2 also have comparably good skill, or were some significantly worse? Some discussion of the best performing architecture is warranted as could relate to the model's ability to generalize. e.g. one would expect a tree-based model to have difficulty generalizing as the output distribution is bounded by its training set.
- The methods section should include some information about the computational resources used in training. This affects the space of model hyperparameters that can be explored by the Auto-ML algorithm. In particular the tree depth in the tree-based methods is directly related to the distribution of possible model outputs.
- In the interpretability section, there should be some discussion of whether the results were surprising or expected given prior knowledge of boundary layer processes. E.g. “In spring and autumn, while a comparable pattern exists, the differences between predicted and observed values are smaller, suggesting lower variability (or complexity) in meteorological conditions compared to summer." and “Potential reasons include:... distinct entrainment processes in summer compared to other seasons”. I am not familiar with boundary layer processes, so for readers like me: Is it implied that it is already known that summer has lower variability in conditions and distinct entrainment processes, or are those the authors’ hypotheses to explain their findings?
- I appreciate the breakdown of the results into the seasonal comparisons in section 3.5.2 and discussion of the physical processes affecting the CBLH and its variability. Here and in other sections, I think the writers did a good job of explaining how the physical processes involved in boundary layer changes might explain their findings.
- The readability would be greatly improved if the main text section related to importance/interpretablity just focused on the main takeaway (LTS dominates) and left the rest to an appendix. Similarly for the section about ECOR vs EBBR flux results; I did not feel the findings were salient to the main points of the paper.
Other comments:
- Please define the variables in equation 4.
- Hyperparameters for the ExtraTreesRegressor in Sec 3.4 should be provided.
- Why is only JJA used in the comparison of the different ML methods in 3.4? Is it because the authors specifically wanted to study the season with higher DL-derived CBLH variability? Please clarify in the text.
- Table 4: What is being shown in the rows labeled by the inputs? Feature importance? Please clarify in the caption.
- In the conclusion, L849 states the ML model ”significantly improves the accuracy and generalizability of CBLH predictions across diverse sites and seasons.” This ought to be edited as without a baseline for comparison, it is unclear this improvement is relative to.
Citation: https://doi.org/10.5194/egusphere-2025-2490-RC1 -
RC2: 'Comment on egusphere-2025-2490', Anonymous Referee #1, 08 Sep 2025
reply
Review of the article titled “Machine Learning model for inverting convective boundary layer height with implicit physical constraints and its multi-site applicability” by Chu and coauthors for publication in Atmos. Chem. Phys.
The authors have used boundary layer (BL) height from the doppler lidar (DL), surface fluxes from eddy correlation (ECOR) and energy balance Bowen Ratio (EBBR), and thermodynamic stability from Atmospheric Emitted Radiance Interferometer (AERI) to construct a machine learning (ML) model for predicting BL height. The data from ARM SGP site, and other ancillary sites around SGP have been used. The main premise of the paper is using the off-the-shelf interfaces like TPOT and AutoKeras for training and validation, thereby leaving AI to pick the ML model. After model identification, the authors have applied the model to predict BL height over different seasons and different sites. The article is relatively well-written and fits the journals scope. However, I believe that the article lacks physical depth, and could be improved. Much of the discussion is on simply adapting the data for TPOT and AutoKeras, which is not novel. The paper is also too long at this point, some of the discussion is more suitable for a dissertation rather than a paper. So, mentioning few things below that can improve the article further.
I like that you are trying to use some physical constraints as input parameters to improve the ML model. LTS, time, and sun parameters are a good start (Line 345). However, through previous research it has been shown that presence of elevated humidity above the BL can also affect the BL development through radiative effects, and same is true for high-level clouds. These effects in some part will be reflected in the ECOR fluxes, but with a time delay. In addition, wind speed, wind direction, wind shear and wind veer have also been shown to be very important. So maybe you can include the following parameters in your input models, as they are also available at the ARM sites: wind speed, wind direction, wind shear, wind veer, surface upwelling and downwelling longwave and shortwave radiation. Surface meteorological variables will also be good to include. I understand that including cloud properties might be hard, but given the strong expertise of authors Deng, Xue and Wang, they can include ceilometer cloud fraction and base height in it. This might significantly improve the model, and the shapely analysis will tell which parameters are important. Thank you.
The authors have used AutoKeras and TPOT for selecting the best model, which is great. Given the small amount of data used in this work, the tree-based models could also be employed in TPOT. It will be good if you can tell us what model and the associated hyperparameters was picked by these two frameworks. I cannot tell if they training was done online, and if so the batch sizes etc. The hyperparameters then should be scrutinized to understand whether any more improvement can be made. On the same topic, it will be good if the authors can mention if they explicitly or the two frameworks implicitly regularized (normalized, bias correct etc.) the input parameters. I assume the (Line 345) sunrise and sunset times and time variable could be normalized by 24.
Figure 4 onwards: these are nice figures, but maybe you can add another panel showing the time evolution of the difference between DL CBLH and predicted CBLH. This will truly tell if the model accurately captures the daytime evolution.
Figure 10: The connection between convective boundary layer and surface fluxes is clear during summer months, but I cannot tell how it works in winter months. Can you please elaborate on the number of samples going into this figure, especially for the colder seasons. It is difficult to assess as to how accurate the surface fluxes might be when the environment is cold and the surface is frozen. Otherwise, you can control for it by using temperature and advection in your input parameters. Thank you.
Citation: https://doi.org/10.5194/egusphere-2025-2490-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
674 | 42 | 12 | 728 | 9 | 10 |
- HTML: 674
- PDF: 42
- XML: 12
- Total: 728
- BibTeX: 9
- EndNote: 10
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1