A Study on the Transformer-CNN Imputation Method for Turbulent Heat Flux Dataset in the Qinghai-Tibet Plateau Grassland

Hou, Quanzhe; Gao, Zhiqiu; Duan, Zexia; Yu, Minghui

doi:10.5194/egusphere-2023-2685

Preprints

https://doi.org/10.5194/egusphere-2023-2685

Preprints

08 Jan 2024

| 08 Jan 2024

A Study on the Transformer-CNN Imputation Method for Turbulent Heat Flux Dataset in the Qinghai-Tibet Plateau Grassland

Quanzhe Hou, Zhiqiu Gao, Zexia Duan, and Minghui Yu

Abstract. Based on the turbulent heat flux from the third scientific expedition to the Qinghai-Tibet Plateau in 2012, imputation evaluations were conducted using algorithms like Random Forest, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Gradient Boosting (XGBoost), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and the Transformer model with deep self-attention mechanism. Results indicated that the Transformer model performed optimally. To further enhance imputation accuracy, a combined model of Transformer and Convolutional Neural Network (CNN), termed as Transformer_CNN, was proposed. Herein, while the Transformer primarily focused on global attention, the convolution operations in the CNN provided the model with local attention. Experimental outcomes revealed that the imputations from Transformer_CNN surpassed the traditional single artificial intelligence model approaches. The coefficient of determination (R²) reached 0.949 in the sensible heat flux test set and 0.894 in the latent heat flux test set, thereby confirming the applicability of the Transformer_CNN model for data imputation of turbulent heat flux in the Qinghai-Tibet Plateau. Ultimately, the turbulent heat flux observational database from 2007 to 2016 at the station was imputed using the Transformer_CNN model.

Received: 13 Nov 2023 – Discussion started: 08 Jan 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 2741 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (2741 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

29 Jul 2025

Interpolating turbulent heat fluxes missing from a prairie observation on the Tibetan Plateau using artificial intelligence models

Quanzhe Hou, Zhiqiu Gao, Zexia Duan, and Minghui Yu

Geosci. Model Dev., 18, 4625–4641, https://doi.org/10.5194/gmd-18-4625-2025,https://doi.org/10.5194/gmd-18-4625-2025, 2025

Short summary

Quanzhe Hou, Zhiqiu Gao, Zexia Duan, and Minghui Yu

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-2685', Anonymous Referee #1, 07 Feb 2024
The aim of this study is to build and validate an original method for reconstructing missing data on turbulent heat fluxes using eddy covariance (EC) at the QOMS station in Tibet. Time series covering a 10-year period and presenting gaps are supplemented by methods independent of the physical relationships between fluxes and environmental variables. These methods are based on machine learning (ML), and use continuous time series of 16 to 18 environmental variables at the same site. A validation with statistical indicators shows a good performance of all methods, and a superiority of the method built by the authors, based on Transformer (Transformer_CNN).
The article is interesting, and the method used provides very conclusive results of reconstructed fluxes compared with EC observations, over fairly long periods of time. However, in my opinion, it lacks context and details on certain aspects of the method. Some choices of methods are not sufficiently justified, and the conclusion could usefully include a discussion of the use of the method on other flux datasets or of its limitations. The advantage of using a purely ML-based (without information on the physical links between fluxes and variables) should also be discussed. Here below are my main questions/comments:
Major comments
1- There is an overall lack of general context and discussion on the added value of the Transformer_CNN method: why do the authors directly use the ML based reconstruction without trying a more classical (physics-based) method of flux computation? This paper is submitted to GMD as a ‘development and technical paper’; and as such, it should clearly assess the performance of the model presented with respect to existing methods (not limited to ML-based gap filling). If the aim of the study is rather, as stated l. 93-94, to ‘complete the imputation of turbulent heat flux for this site spanning from 2007 to 2016 and make this dataset publicly accessible’, this study will be more usefully published as a ‘data paper’ in some dedicated journal. Can the authors explain why a basic flux computation algorithm is not usable here? If so, is it due to the different regimes of atmospheric conditions and soil covers encountered in the site throughout the year? Also, can you add information about the significance of the differences between the methods, based on the statistical indicators used in part 4? The SVM method already provides very good results in my view. Is the difference between the SVM and Transformer_CNN MAE significant? Or the distance between the SVM and Transformer_CNN positions in the Taylor diagram in Fig. 5a? The SVM method is simple and ready-to-use, what is the added value of developing a new, ML-based method for gap filling time series of EC fluxes? This added value could be efficiently demonstrated by a comparison of the resulting time series (by adding the SVM reconstruction to Fig. 8 for instance, or to a close up of it). There would also be an interest in identifying the time periods where ML algorithms yield different results from physical parameterization, to demonstrate the contribution of ML, and possibly in discussing the reasons of this discrepancy.
2- The physical meaning of the results is probably worth a discussion. In 2.3, what is the physical meaning of the variables ranking first for H/LE? Please comment why the numbers of variables, and variables themselves are different for H and LE. Some correlations between subgroups of variables are probably rather high (e.g. Ta_2m and Ta_1.5m, RH_1.5m and RH_2m): could the same results be obtained with less variables? Not all sites provide measurements of the soil temperature between the surface and 4 m, or air temperature between 1.5 and 10 m. Could the same (very good) fit be obtained with measurements at first levels (Ts 0m and Ta 1.5m) only?
3- Can the method presented here be used directly or with adaptation at different sites? It would be really interesting to add sensitivity tests of the importance of the different variables selected in the H (18) and LE (16) subgroups. Would the fit be significantly lower when excluding RH_2m and/or RH_4m from the H subgroup?
4- The preprocessing part is insufficiently explained. Why is it relevant to use random forest (RF) rather than a principal component analysis (PCA) to select relevant variables? PCA are based on linear correlation between variables. Are you sure that the linear model is not appropriate here? Did you checked that a simple correlation / covariance analysis does not lead to the same results as in Fig 3? Also, can you please explain briefly the meaning of the ‘importance ranking’ (Fig. 3 a and b) and, more importantly, of the ‘OOB score’ for the readers not familiar with the RF terminology?
5- The building of the Transformer_CNN method itself is insufficiently detailed for the readers not familiar with ML in general. Is it something new, built on purpose for the present study, or has it been used previously? In the first case, can you please elaborate on the reasons leading to the choice of the different steps and modules used? In the latter case, please provide some references. It would also be interesting to study the attention weights determined by the transformer to analyze the causal link between certain variables and the reconstructed fluxes. This provides a perspective on the physical interpretability of the transformer performances.
Minor comments
Abstract: the RF is not part of the methods evaluated with SVM and so on. Please reformulate.
What is the sampling of the variables used to reconstruct the fluxes? I guess hourly samples (l. 210 and for the study of the diurnal cycle), but daily values are probably used in Fig. 2, Fig 6 and Fig 9? Please specify.
Figure 2: the legend is flawed and incomplete, please complete.
Table 1: please discuss the statistics and trends provided here. Are these figures useful for the main analysis presented in the paper ?
Please use the same acronyms for the variables throughout the text, tables and figure legends (e.g. table 1 and table 2);
Please define MAE, RMSE, FC
Some references are not correctly cited in the text (l. 39)
181: please explain where this value of 159 comes from.

378-388: this analysis is interesting and could be completed /made more specific by making explicit reference to the variables influencing the flux variations (Fig. 3).

Tables 4 and 5, and Fig. 5 are somehow redundant. Figure 5 can probably be moved to an appendix.
Figure 6 is ill-designed. Please use the same scale for x and y axes and enlarge to be sure to include all the data.
Figure 8 is not very clear: I guess that the reconstructed values (red) are masked by the observations (purple) when both are present. Plotting the reconstructed values above the observed ones would probably make it clearer.
Citation: https://doi.org/10.5194/egusphere-2023-2685-RC1
- AC1: 'Reply on RC1', Quanzhe Hou, 01 Apr 2024
  
  Firstly, I wish to extend my apologies to you. Due to personal reasons, it is only now that I am able to respond. We received your review comments on our manuscript on February 7, 2024, for which we are deeply grateful. We are particularly appreciative of the thorough and constructive feedback you provided on our manuscript titled "Study on Transformer-CNN Imputation Methods for Grassland Turbulent Heat Flux Datasets in the Tibetan Plateau". To date, we have carefully revised our manuscript in accordance with your invaluable suggestions and have addressed all the comments. Details of our responses to the comments can be found in the PDF file attached.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2685-AC1
RC2:
'Comment on egusphere-2023-2685', Anonymous Referee #2, 19 Feb 2024

The paper compares a few machine/deep learning (ML/DL) methods for imputing sensible (H) and latent (L) heat fluxes. The driving variables used for this purpose are meteorological variables such as temperature. Overall, the results are quite promising in that one of the DL methods used in the paper is quite good. But the paper is not ready for publication yet.
Major comments:
(1) The methodology is not described adequately. All the ML/DL models have several hyper-parameters that need to be carefully selected to develop an optimal model. For example, how many LSTM layers were used? How many neurons in each LSTM layer? What was the learning rate used? There is some randomness in the training of DL models; therefore, several models (8-10) are developed with different random seeds. Was this method adopted in the paper? I know that randomness can be quite significant at least for LSTMs. As another example, how were the widths determined in convolutional layers? These considerations are really important.
(2) RF has been used for feature selection. Why? If there are some redundant features, those would be taken care of by the ML/DL algorithms except the KNN method. Also, why not include RF as one of the ML algorithms to impute the H and L values? RF is surely better than the KNN method. Also, note that RF can be thought of as an adaptive KNN so it seems better to used RF than KNN.
(3) The driving variables in this study also had missing values which were imputed using the KNN method. First of all, the accuracy of the KNN method in imputing these driving variables need to be established. Second, why not try other methods such as random forest for imputing the driving variables? Also, why select 3 nearest neighbors? Were other combinations tried? The method to compute distance in the KNN approach has not been described either.
(4) The methodology for testing the different DL/ML methods is not rigorous enough. A total of 10 years of data are used where 9 years of the data are used for training and 1 year (year 2012) is used for testing. This methodology should be repeated for each year as the testing year in iteration. Basically, use the data from 2007 as the test data and rest of the data for training. Then, used 2008 as testing year and rest as training, and so on.
(5) The generalizability of the results is unclear. Mainly several different meteorological variables are used as driving variables. Would these variables be available for other sites also? If no, the method cannot be generalized. It would be nice to see the model performance with different sets of predictor variables Say, if only surface temperature data are available as driving variable, how good would be the imputation?
I have other specific comments in the attached pdf.

Citation: https://doi.org/10.5194/egusphere-2023-2685-RC2
- AC2: 'Reply on RC2', Quanzhe Hou, 01 Apr 2024
  
  Firstly, I would like to extend my sincerest apologies to you. Due to personal reasons and the need to incorporate a substantial amount of experimental data, it has taken me until now to be able to respond. We received your review comments on our manuscript on February 19, 2024, for which we are profoundly grateful. We deeply appreciate the comprehensive and constructive feedback you provided on our research titled "A Study on the Transformer-CNN Imputation Method for Turbulent Heat Flux Dataset in the Qinghai-Tibet Plateau Grassland" Up to this point, we have meticulously revised our manuscript in accordance with your invaluable suggestions and have addressed all the comments raised. For detailed responses to these comments, please refer to the attached PDF document.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2685-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-2685', Anonymous Referee #1, 07 Feb 2024
The aim of this study is to build and validate an original method for reconstructing missing data on turbulent heat fluxes using eddy covariance (EC) at the QOMS station in Tibet. Time series covering a 10-year period and presenting gaps are supplemented by methods independent of the physical relationships between fluxes and environmental variables. These methods are based on machine learning (ML), and use continuous time series of 16 to 18 environmental variables at the same site. A validation with statistical indicators shows a good performance of all methods, and a superiority of the method built by the authors, based on Transformer (Transformer_CNN).
The article is interesting, and the method used provides very conclusive results of reconstructed fluxes compared with EC observations, over fairly long periods of time. However, in my opinion, it lacks context and details on certain aspects of the method. Some choices of methods are not sufficiently justified, and the conclusion could usefully include a discussion of the use of the method on other flux datasets or of its limitations. The advantage of using a purely ML-based (without information on the physical links between fluxes and variables) should also be discussed. Here below are my main questions/comments:
Major comments
1- There is an overall lack of general context and discussion on the added value of the Transformer_CNN method: why do the authors directly use the ML based reconstruction without trying a more classical (physics-based) method of flux computation? This paper is submitted to GMD as a ‘development and technical paper’; and as such, it should clearly assess the performance of the model presented with respect to existing methods (not limited to ML-based gap filling). If the aim of the study is rather, as stated l. 93-94, to ‘complete the imputation of turbulent heat flux for this site spanning from 2007 to 2016 and make this dataset publicly accessible’, this study will be more usefully published as a ‘data paper’ in some dedicated journal. Can the authors explain why a basic flux computation algorithm is not usable here? If so, is it due to the different regimes of atmospheric conditions and soil covers encountered in the site throughout the year? Also, can you add information about the significance of the differences between the methods, based on the statistical indicators used in part 4? The SVM method already provides very good results in my view. Is the difference between the SVM and Transformer_CNN MAE significant? Or the distance between the SVM and Transformer_CNN positions in the Taylor diagram in Fig. 5a? The SVM method is simple and ready-to-use, what is the added value of developing a new, ML-based method for gap filling time series of EC fluxes? This added value could be efficiently demonstrated by a comparison of the resulting time series (by adding the SVM reconstruction to Fig. 8 for instance, or to a close up of it). There would also be an interest in identifying the time periods where ML algorithms yield different results from physical parameterization, to demonstrate the contribution of ML, and possibly in discussing the reasons of this discrepancy.
2- The physical meaning of the results is probably worth a discussion. In 2.3, what is the physical meaning of the variables ranking first for H/LE? Please comment why the numbers of variables, and variables themselves are different for H and LE. Some correlations between subgroups of variables are probably rather high (e.g. Ta_2m and Ta_1.5m, RH_1.5m and RH_2m): could the same results be obtained with less variables? Not all sites provide measurements of the soil temperature between the surface and 4 m, or air temperature between 1.5 and 10 m. Could the same (very good) fit be obtained with measurements at first levels (Ts 0m and Ta 1.5m) only?
3- Can the method presented here be used directly or with adaptation at different sites? It would be really interesting to add sensitivity tests of the importance of the different variables selected in the H (18) and LE (16) subgroups. Would the fit be significantly lower when excluding RH_2m and/or RH_4m from the H subgroup?
4- The preprocessing part is insufficiently explained. Why is it relevant to use random forest (RF) rather than a principal component analysis (PCA) to select relevant variables? PCA are based on linear correlation between variables. Are you sure that the linear model is not appropriate here? Did you checked that a simple correlation / covariance analysis does not lead to the same results as in Fig 3? Also, can you please explain briefly the meaning of the ‘importance ranking’ (Fig. 3 a and b) and, more importantly, of the ‘OOB score’ for the readers not familiar with the RF terminology?
5- The building of the Transformer_CNN method itself is insufficiently detailed for the readers not familiar with ML in general. Is it something new, built on purpose for the present study, or has it been used previously? In the first case, can you please elaborate on the reasons leading to the choice of the different steps and modules used? In the latter case, please provide some references. It would also be interesting to study the attention weights determined by the transformer to analyze the causal link between certain variables and the reconstructed fluxes. This provides a perspective on the physical interpretability of the transformer performances.
Minor comments
Abstract: the RF is not part of the methods evaluated with SVM and so on. Please reformulate.
What is the sampling of the variables used to reconstruct the fluxes? I guess hourly samples (l. 210 and for the study of the diurnal cycle), but daily values are probably used in Fig. 2, Fig 6 and Fig 9? Please specify.
Figure 2: the legend is flawed and incomplete, please complete.
Table 1: please discuss the statistics and trends provided here. Are these figures useful for the main analysis presented in the paper ?
Please use the same acronyms for the variables throughout the text, tables and figure legends (e.g. table 1 and table 2);
Please define MAE, RMSE, FC
Some references are not correctly cited in the text (l. 39)
181: please explain where this value of 159 comes from.

378-388: this analysis is interesting and could be completed /made more specific by making explicit reference to the variables influencing the flux variations (Fig. 3).

Tables 4 and 5, and Fig. 5 are somehow redundant. Figure 5 can probably be moved to an appendix.
Figure 6 is ill-designed. Please use the same scale for x and y axes and enlarge to be sure to include all the data.
Figure 8 is not very clear: I guess that the reconstructed values (red) are masked by the observations (purple) when both are present. Plotting the reconstructed values above the observed ones would probably make it clearer.
Citation: https://doi.org/10.5194/egusphere-2023-2685-RC1
- AC1: 'Reply on RC1', Quanzhe Hou, 01 Apr 2024
  
  Firstly, I wish to extend my apologies to you. Due to personal reasons, it is only now that I am able to respond. We received your review comments on our manuscript on February 7, 2024, for which we are deeply grateful. We are particularly appreciative of the thorough and constructive feedback you provided on our manuscript titled "Study on Transformer-CNN Imputation Methods for Grassland Turbulent Heat Flux Datasets in the Tibetan Plateau". To date, we have carefully revised our manuscript in accordance with your invaluable suggestions and have addressed all the comments. Details of our responses to the comments can be found in the PDF file attached.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2685-AC1
RC2:
'Comment on egusphere-2023-2685', Anonymous Referee #2, 19 Feb 2024

The paper compares a few machine/deep learning (ML/DL) methods for imputing sensible (H) and latent (L) heat fluxes. The driving variables used for this purpose are meteorological variables such as temperature. Overall, the results are quite promising in that one of the DL methods used in the paper is quite good. But the paper is not ready for publication yet.
Major comments:
(1) The methodology is not described adequately. All the ML/DL models have several hyper-parameters that need to be carefully selected to develop an optimal model. For example, how many LSTM layers were used? How many neurons in each LSTM layer? What was the learning rate used? There is some randomness in the training of DL models; therefore, several models (8-10) are developed with different random seeds. Was this method adopted in the paper? I know that randomness can be quite significant at least for LSTMs. As another example, how were the widths determined in convolutional layers? These considerations are really important.
(2) RF has been used for feature selection. Why? If there are some redundant features, those would be taken care of by the ML/DL algorithms except the KNN method. Also, why not include RF as one of the ML algorithms to impute the H and L values? RF is surely better than the KNN method. Also, note that RF can be thought of as an adaptive KNN so it seems better to used RF than KNN.
(3) The driving variables in this study also had missing values which were imputed using the KNN method. First of all, the accuracy of the KNN method in imputing these driving variables need to be established. Second, why not try other methods such as random forest for imputing the driving variables? Also, why select 3 nearest neighbors? Were other combinations tried? The method to compute distance in the KNN approach has not been described either.
(4) The methodology for testing the different DL/ML methods is not rigorous enough. A total of 10 years of data are used where 9 years of the data are used for training and 1 year (year 2012) is used for testing. This methodology should be repeated for each year as the testing year in iteration. Basically, use the data from 2007 as the test data and rest of the data for training. Then, used 2008 as testing year and rest as training, and so on.
(5) The generalizability of the results is unclear. Mainly several different meteorological variables are used as driving variables. Would these variables be available for other sites also? If no, the method cannot be generalized. It would be nice to see the model performance with different sets of predictor variables Say, if only surface temperature data are available as driving variable, how good would be the imputation?
I have other specific comments in the attached pdf.

Citation: https://doi.org/10.5194/egusphere-2023-2685-RC2
- AC2: 'Reply on RC2', Quanzhe Hou, 01 Apr 2024
  
  Firstly, I would like to extend my sincerest apologies to you. Due to personal reasons and the need to incorporate a substantial amount of experimental data, it has taken me until now to be able to respond. We received your review comments on our manuscript on February 19, 2024, for which we are profoundly grateful. We deeply appreciate the comprehensive and constructive feedback you provided on our research titled "A Study on the Transformer-CNN Imputation Method for Turbulent Heat Flux Dataset in the Qinghai-Tibet Plateau Grassland" Up to this point, we have meticulously revised our manuscript in accordance with your invaluable suggestions and have addressed all the comments raised. For detailed responses to these comments, please refer to the attached PDF document.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2685-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

AR by Quanzhe Hou on behalf of the Authors (05 May 2024) Author's response Author's tracked changes

EF by Sarah Buchmann (16 May 2024) Manuscript

ED: Referee Nomination & Report Request started (16 May 2024) by Le Yu

RR by Anonymous Referee #2 (01 Jun 2024)

RR by Anonymous Referee #1 (05 Jun 2024)

Suggestions for revision or reasons for rejection

All my main comments about the previous version have been addressed in a more or less complete way. The authors now make a very good job in presenting the methods they use, why they choose them and their advantages with respect to more classical imputation methods (which were added in the comparison). The paper is much clearer and comprehensive this way. It is rather well written and reads easily. Thank you for that.
I still have some minor comments below.
Main comments:
Whereas generally well written, I feel that the level of English of the paper could be improved (I am not a native myself). For instance, the sentences are long, esp in the introduction and conclusion part.
The addition of tests of the same method with fewer parameters and on other measurement sites (SI) is nice and is a significant improvement of the paper. If properly discussed with respect to the main results, it will broaden the conclusions and the applicability of the Transformer_CNN method. Please discuss this (comparison of use with all parameters, restricted set of parameters at QOMS, restricted set of parameters at other site) in the main text, maybe at the end of the Results part.
Additional comments:
l. 117: higher —> high? the sentence sounds weird, please rephrase
Table 1: please check/homogeneize scientific notation
l. 165: Euclidenian —> Euclidean?
l. 173: I am not sure reliability is the right word here: utility?
l. 240 and following: the Tibetan plateau is highly variable in its geography, which makes the observation site unique —> discuss in the applicability of the method
Section 3: maybe rephrase experiments and methods, with an opening sentence like "we present here the experimental design and the different statistical and learning methods used in this study"
l. 293: ReLU? it is defined in the SI, please specify
l. 304: He initialization? reference?
Figure 4: several terms (GELU, loss functions) are not defined in the main text but in the SI, please make reference to the SI.
Please check the spelling of citations, throughout the papers (Swinbank and W.C., 1951, Bishop and M 2006)

Hide

ED: Reconsider after major revisions (08 Jul 2024) by Le Yu

AR by Quanzhe Hou on behalf of the Authors (12 Aug 2024) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (20 Aug 2024) by Le Yu

RR by Anonymous Referee #1 (21 Aug 2024)

RR by Anonymous Referee #3 (30 Aug 2024)

Suggestions for revision or reasons for rejection

The authors have resolved most of the questions from the reviewers. The following comments are based on my experience with ML models. There are some aspects that need clarification before publication in GMD. Please see my following comments.
Figure1 is not clear. I would suggest recreating it. The authors could use some open-source library to denote the location of the site and the elevation of the Tibet area.
Figure2. Add the description of the horizontal line, which could be the linear trend or the MK equation as in Table 1 for each variable. It would be helpful to denote the slope (e.g., degree per decade). As for Table 1, what’s the fitting equation of the MK test? Is MK test only a statistical test without providing the equations? Are these equations estimated from least squares regression or Theil–Sen estimator? Please add more clarification here.
Line 184. “Before training the model…”. Please clarify which model. KNN and Random forest are also ML models.
Following the comments from previous reviewers, the use of KNN and RandomForest requires further justification. If my understanding is correct, KNN is first used to fill missing values (line 157) with the number of nearest neighbors setting to 3. Then it is used as a comparison to all the other DL models (e.g., LSTM, Transformer). This seems like a two-step KNN. Please clarify it in the text. In addition, LSTM handles time-series as input, whereas RandomForest does not take time dependency into consideration (theoretically we can input a time series into RandomForest, but from Figure 3, it seems each variable is used for one time step). How can the feature importance from this RF can be used to compare with LSTM (e.g., if Prec_h has a strong effect but with a time lag, it will not show in Figure3 but actually is important for LSTM)?
In addition, if RandomForest has already chosen to predict heat fluxes, why not use it as a benchmark model for the following comparisons? If its skill is not optimal, why would we trust its feature importance results?
Line 216. Is SVM used as a classifier or a regressor here?
Line 269. Is the CNN layer also used before LSTM and GRU? Please clarify it in the model description section.
I may miss some parts, but what’s the input time window size to the ML and DL models (i.e., how many time steps are used as input? What’s the sequence length)? For traditional ML models (SVM etc.,), do they take input variables as a time series, or at a concurrent time step? If the latter, they have less information than the RNN models, which is not a fair comparison.
Following the previous comment, are we looking at hourly predictions in Fig 6 and daily in Fig 8? Please clarify it in the caption.
Fig4. What does “view” represent here? Does it mean the model has two outputs? In the responses, the authors mentioned “contrastive learning”, which is not in the main text. If this is “contrastive learning”, a reference is needed here.
In the authors responses, a cross-validation test is added. However, instead of separating the predictions by years, the authors should use the concatenated prediction to assess the performance (i.e., predictions from all years vs target from all years).

Overall, this is a good paper for GMD, but requires more details and revision.

Hide

RR by Anonymous Referee #4 (08 Sep 2024)

RR by Anonymous Referee #5 (13 Sep 2024)

RR by Ye Liu (17 Sep 2024)

Suggestions for revision or reasons for rejection

Review of “A Transformer-CNN Imputation Model for Turbulent Heat Fluxes in the Tibetan Plateau Grassland” by Hou et. al.

This study investigates the use of various traditional statistical methods and machine learning algorithms, including mean diurnal variation (MDV), nonlinear regression (NR), look-up tables (LUT), Support Vector Machines (SVM), K-Nearest Neighbors (KNN), Gradient Boosting (XGBoost), Long Short-Term Memory (LSTM), Gated Recurrent Units (GRU), and Transformer models, to impute missing values in measured sensible and latent heat fluxes. Among the methods tested, the Transformer model generally outperformed the others. Furthermore, when integrated with a Convolutional Neural Network (CNN), the combined Transformer_CNN approach surpassed the performance of the traditional single-method approaches.
This study presents a generalized framework for imputing missing observational data, resulting in a complete and more reliable dataset. The manuscript is well-organized and written clearly. However, I have a few minor suggestions for the authors’ consideration:
1. In Section 2.3 Data Preprocessing, the topic sentence states that K-NN is used to interpolate missing values in environmental driving variables. However, lines 164–165 suggest that KNN imputation is applied to estimate missing values in sensible and latent heat fluxes. Clarifying the intended use would improve consistency.
2. Line 175, a brief explanation of how random forests rank the contributions of different variables would be useful for readers.
3. Line 315-317. Could the authors expand on the concept of multi-scale interactions and long-distance dependencies? For instance, are these related to temporal dimensions, or do they involve correlations between driving variables?
4. Line 225, which traditional statistical method was used to generate the test dataset? Additionally, in line 359, if a traditional statistical method serves as a reference, is it appropriate to compare it with machine learning models, and why do the machine learning approaches outperform the reference dataset?
5. Table 4, could the statistics for the Transformer_CNN be included?
6. Fig 5, the panels are not labeled.
7. For the published data, it would be helpful to have a QC indicator of measured and estimated values.

Hide

ED: Reconsider after major revisions (26 Sep 2024) by Le Yu

AR by Quanzhe Hou on behalf of the Authors (06 Nov 2024) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (21 Nov 2024) by Le Yu

RR by Ye Liu (11 Dec 2024)

RR by Anonymous Referee #7 (19 Dec 2024)

Suggestions for revision or reasons for rejection

Remark: Apparently, I am a new reviewer of this manuscript that has undergone previous rounds) of reviews. I have tried to evaluate the responses to previous reviewers without raising completely new points. However, I do see some aspects that need considerable revisions, e.g. the statistical analysis of the long-term trends. Also, some parts of the manuscript are still unclear.

Summary: The manuscript is focused on the filling of missing observations of turbulent heat fluxes over a 10-year period in a region in the Tibetan Plateau , using machine learning methods. The main conclusion is that these methods are superior to classical statistical methods used so far.

Recommendation: Although this manuscript has apparently been previously reviewed by other reviewers, there are still a few important issues that require the attention of the authors. There are key methodological aspects that are not explained or not well explained. Also, the statistical analysis of the long-term trends of some environmental variables is not correct.

Main points

1) The methodology is based on using environmental drivers of the turbulent heat fluxes to fill in missing observations of those fluxes. Yet, I could not find in the manuscript a clear statement that those were indeed the environmental drivers used in the machine learning methods. I infer that they are those shown in Figure 3 and listed i section 2, but it is something that the reader has to assume (?). Also, unless I missed it, the manuscript does not describe the sampling frequency of those data (hourly, daily, monthly? It seems that the authors have analyzed hourly data (From Figure 2), but this is not clearly stated, and the sampling frequency is used to impute the missing observations.

2) The statistical analysis of trends shown in Figure 2 is not correct. The time series show a clear temporal auto correlation so that the effective degrees of freedom should be less, uneven less than the number of time steps in the series.. Actually, if the purpose of the analysis is to identify long-term trends, the use of hourly, daily, or even monthly data is not meaningful since all display a strong annual cycle that masks the long-term trend. For that purpose, it is customary to use annual means, for which the temporal autocorrelation is probably negligible, and the number of degrees of freedom is then equal to the number of time steps. I guess the trends will not attain the level of significance included in Table 1.
The text is also a bit unclear. The authors state that they take a value of p=0.95, but actually, the p-value is estimated from the analysis (and included in Table 2), so mentioning a value of p=0.95 is confusing.

This being said, this section does not really fit into the main objective of the study, which is the reconstruction of missing observational values. The study is not about climate change in this area, for which a much longer time series than 10 years would be needed. So, the role that this section plays in the whole manuscript is not clear to me. even less so as the journal is about Model and Method development

3) The results section compares several measures of skill for the different methods, but actually, those measures are not totally informative. The heat fluxes probably display a very strong annual cycle, so the question arises as to what the skill of a simple imputation scheme in which the missing value is just replaced by the long-term mean would be. Only this comparison would give meaningful information about the skill of the methods. It may well happen that what all the machine learning methods do is just estimate a mean annual cycle. An added value would only be justified if the skill is better than the climatology imputation

Particular points

4) The title ‘A Transformer-CNN Imputation Model for Turbulent Heat Fluxes
in the Tibetan Plateau Grassland’ can confuse some readers. Imputation of what? The title should include the keyword ‘missing observation’ somewhere

5) Based on the turbulent heat flux from the third scientific expedition to the Tibetan Plateau in 2012, imputation
evaluations were conducted using algorithms like

This sentence is also confusing for some readers. Which turbulent heat flux? Surely, at the land surface, but this needs to be stated, as this is the first sentence in the abstract.

6) ‘ Results indicated that the Transformer model performed optimally. ‘

Optimally, this method would be the best of all possible methods. The study only tested a few methods.

7) ‘ With global climate change, the ecosystems and
water resources of the Tibetan Plateau have undergone significant impacts. ‘

Can the authors include a reference to support this sentence?

8) ‘imputation methods based on artificial neural networks; and (4) imputation methods based on machine learning algorithms.’

Neural networks belong to the family of machine learning

9) Not only is this observation station influenced by climate variations and weather processes, but it is also affected by local circulations of the Himalayan range, such as valley winds, making it an ideal location for monitoring surface processes on the Tibetan Plateau

The way that climate change and weather processes impact the local station can only be through local processes. A station does not see the global climate.

10) ‘During midday in the summer, surface temperatures can rise to 60 oC, displaying a gradually increasing pattern throughout the year.’

What does ‘ gradual increasing pattern mean? Do temperatures rise continuously from January to December?

10) Lastly, the imputed data is merged with the original sensible and latent heat fluxes to derive a comprehensive data set of environmental driving variables.

This sentence really confused me. The latent and sensible data are not part of the driving variables (?)

11) Table 1. What is X ? This is explained in the text but not in the caption. Also, the text states that X is hours, but which moment in time is it relative to? All this information needs to be included in the caption

Hide

ED: Reconsider after major revisions (21 Dec 2024) by Le Yu

AR by Quanzhe Hou on behalf of the Authors (08 Jan 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (06 Feb 2025) by Le Yu

RR by Anonymous Referee #7 (14 Feb 2025)

ED: Publish subject to minor revisions (review by editor) (22 Feb 2025) by Le Yu

AR by Quanzhe Hou on behalf of the Authors (26 Feb 2025)

EF by Vitaly Muravyev (04 Mar 2025) Manuscript Author's response Author's tracked changes Supplement

ED: Publish as is (14 Mar 2025) by Le Yu

AR by Quanzhe Hou on behalf of the Authors (20 Mar 2025)

Journal article(s) based on this preprint

29 Jul 2025

Interpolating turbulent heat fluxes missing from a prairie observation on the Tibetan Plateau using artificial intelligence models

Quanzhe Hou, Zhiqiu Gao, Zexia Duan, and Minghui Yu

Geosci. Model Dev., 18, 4625–4641, https://doi.org/10.5194/gmd-18-4625-2025,https://doi.org/10.5194/gmd-18-4625-2025, 2025

Short summary

Quanzhe Hou, Zhiqiu Gao, Zexia Duan, and Minghui Yu

Viewed

Total article views: 836 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
574	219	43	836	45	53

HTML: 574
PDF: 219
XML: 43
Total: 836
BibTeX: 45
EndNote: 53

Views and downloads (calculated since 08 Jan 2024)

Month	HTML	PDF	XML	Total
Jan 2024	112	21	5	138
Feb 2024	47	12	5	64
Mar 2024	41	6	2	49
Apr 2024	45	17	10	72
May 2024	19	21	1	41
Jun 2024	49	23	4	76
Jul 2024	22	16	5	43
Aug 2024	18	4	2	24
Sep 2024	17	10	0	27
Oct 2024	11	6	0	17
Nov 2024	30	4	0	34
Dec 2024	12	8	0	20
Jan 2025	9	7	5	21
Feb 2025	19	5	1	25
Mar 2025	13	16	1	30
Apr 2025	16	12	1	29
May 2025	33	7	1	41
Jun 2025	34	15	0	49
Jul 2025	27	7	0	34
Aug 2025	0
Sep 2025	0
Oct 2025	0
Nov 2025	0
Dec 2025	0
Jan 2026	0
Feb 2026	0
Mar 2026	2	0	2
Apr 2026	0

Cumulative views and downloads (calculated since 08 Jan 2024)

Month	HTML	PDF	XML	Total
Jan 2024	112	21	5	138
Feb 2024	47	12	5	64
Mar 2024	41	6	2	49
Apr 2024	45	17	10	72
May 2024	19	21	1	41
Jun 2024	49	23	4	76
Jul 2024	22	16	5	43
Aug 2024	18	4	2	24
Sep 2024	17	10	0	27
Oct 2024	11	6	0	17
Nov 2024	30	4	0	34
Dec 2024	12	8	0	20
Jan 2025	9	7	5	21
Feb 2025	19	5	1	25
Mar 2025	13	16	1	30
Apr 2025	16	12	1	29
May 2025	33	7	1	41
Jun 2025	34	15	0	49
Jul 2025	27	7	0	34
Aug 2025	0
Sep 2025	0
Oct 2025	0
Nov 2025	0
Dec 2025	0
Jan 2026	0
Feb 2026	0
Mar 2026	2	0	2
Apr 2026	0

Viewed (geographical distribution)

Total article views: 834 (including HTML, PDF, and XML) Thereof 834 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 11 Apr 2026

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (2741 KB)
Metadata XML

Short summary

This study assesses turbulent heat flux data imputation at the Qinghai-Tibet Plateau using various machine learning models. The Transformer model emerged as the most effective, leading to the creation of the Transformer_CNN model, which integrates global and local attention mechanisms. Experimental results showed that Transformer_CNN surpassed other models in performance. This model was effectively used to impute the station's heat flux data from 2007 to 2016.


Total:	0
HTML:	0
PDF:	0
XML:	0