the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A Study on the TransformerCNN Imputation Method for Turbulent Heat Flux Dataset in the QinghaiTibet Plateau Grassland
Abstract. Based on the turbulent heat flux from the third scientific expedition to the QinghaiTibet Plateau in 2012, imputation evaluations were conducted using algorithms like Random Forest, Support Vector Machine (SVM), KNearest Neighbors (KNN), Gradient Boosting (XGBoost), Long ShortTerm Memory (LSTM), Gated Recurrent Unit (GRU), and the Transformer model with deep selfattention mechanism. Results indicated that the Transformer model performed optimally. To further enhance imputation accuracy, a combined model of Transformer and Convolutional Neural Network (CNN), termed as Transformer_CNN, was proposed. Herein, while the Transformer primarily focused on global attention, the convolution operations in the CNN provided the model with local attention. Experimental outcomes revealed that the imputations from Transformer_CNN surpassed the traditional single artificial intelligence model approaches. The coefficient of determination (R^{2}) reached 0.949 in the sensible heat flux test set and 0.894 in the latent heat flux test set, thereby confirming the applicability of the Transformer_CNN model for data imputation of turbulent heat flux in the QinghaiTibet Plateau. Ultimately, the turbulent heat flux observational database from 2007 to 2016 at the station was imputed using the Transformer_CNN model.
 Preprint
(2741 KB)  Metadata XML
 BibTeX
 EndNote
Status: open (until 04 Mar 2024)

RC1: 'Comment on egusphere20232685', Anonymous Referee #1, 07 Feb 2024
reply
The aim of this study is to build and validate an original method for reconstructing missing data on turbulent heat fluxes using eddy covariance (EC) at the QOMS station in Tibet. Time series covering a 10year period and presenting gaps are supplemented by methods independent of the physical relationships between fluxes and environmental variables. These methods are based on machine learning (ML), and use continuous time series of 16 to 18 environmental variables at the same site. A validation with statistical indicators shows a good performance of all methods, and a superiority of the method built by the authors, based on Transformer (Transformer_CNN).
The article is interesting, and the method used provides very conclusive results of reconstructed fluxes compared with EC observations, over fairly long periods of time. However, in my opinion, it lacks context and details on certain aspects of the method. Some choices of methods are not sufficiently justified, and the conclusion could usefully include a discussion of the use of the method on other flux datasets or of its limitations. The advantage of using a purely MLbased (without information on the physical links between fluxes and variables) should also be discussed. Here below are my main questions/comments:
Major comments
1 There is an overall lack of general context and discussion on the added value of the Transformer_CNN method: why do the authors directly use the ML based reconstruction without trying a more classical (physicsbased) method of flux computation? This paper is submitted to GMD as a ‘development and technical paper’; and as such, it should clearly assess the performance of the model presented with respect to existing methods (not limited to MLbased gap filling). If the aim of the study is rather, as stated l. 9394, to ‘complete the imputation of turbulent heat flux for this site spanning from 2007 to 2016 and make this dataset publicly accessible’, this study will be more usefully published as a ‘data paper’ in some dedicated journal. Can the authors explain why a basic flux computation algorithm is not usable here? If so, is it due to the different regimes of atmospheric conditions and soil covers encountered in the site throughout the year? Also, can you add information about the significance of the differences between the methods, based on the statistical indicators used in part 4? The SVM method already provides very good results in my view. Is the difference between the SVM and Transformer_CNN MAE significant? Or the distance between the SVM and Transformer_CNN positions in the Taylor diagram in Fig. 5a? The SVM method is simple and readytouse, what is the added value of developing a new, MLbased method for gap filling time series of EC fluxes? This added value could be efficiently demonstrated by a comparison of the resulting time series (by adding the SVM reconstruction to Fig. 8 for instance, or to a close up of it). There would also be an interest in identifying the time periods where ML algorithms yield different results from physical parameterization, to demonstrate the contribution of ML, and possibly in discussing the reasons of this discrepancy.
2 The physical meaning of the results is probably worth a discussion. In 2.3, what is the physical meaning of the variables ranking first for H/LE? Please comment why the numbers of variables, and variables themselves are different for H and LE. Some correlations between subgroups of variables are probably rather high (e.g. Ta_2m and Ta_1.5m, RH_1.5m and RH_2m): could the same results be obtained with less variables? Not all sites provide measurements of the soil temperature between the surface and 4 m, or air temperature between 1.5 and 10 m. Could the same (very good) fit be obtained with measurements at first levels (Ts 0m and Ta 1.5m) only?
3 Can the method presented here be used directly or with adaptation at different sites? It would be really interesting to add sensitivity tests of the importance of the different variables selected in the H (18) and LE (16) subgroups. Would the fit be significantly lower when excluding RH_2m and/or RH_4m from the H subgroup?
4 The preprocessing part is insufficiently explained. Why is it relevant to use random forest (RF) rather than a principal component analysis (PCA) to select relevant variables? PCA are based on linear correlation between variables. Are you sure that the linear model is not appropriate here? Did you checked that a simple correlation / covariance analysis does not lead to the same results as in Fig 3? Also, can you please explain briefly the meaning of the ‘importance ranking’ (Fig. 3 a and b) and, more importantly, of the ‘OOB score’ for the readers not familiar with the RF terminology?
5 The building of the Transformer_CNN method itself is insufficiently detailed for the readers not familiar with ML in general. Is it something new, built on purpose for the present study, or has it been used previously? In the first case, can you please elaborate on the reasons leading to the choice of the different steps and modules used? In the latter case, please provide some references. It would also be interesting to study the attention weights determined by the transformer to analyze the causal link between certain variables and the reconstructed fluxes. This provides a perspective on the physical interpretability of the transformer performances.
Minor comments
Abstract: the RF is not part of the methods evaluated with SVM and so on. Please reformulate.
What is the sampling of the variables used to reconstruct the fluxes? I guess hourly samples (l. 210 and for the study of the diurnal cycle), but daily values are probably used in Fig. 2, Fig 6 and Fig 9? Please specify.
Figure 2: the legend is flawed and incomplete, please complete.
Table 1: please discuss the statistics and trends provided here. Are these figures useful for the main analysis presented in the paper ?
Please use the same acronyms for the variables throughout the text, tables and figure legends (e.g. table 1 and table 2);
Please define MAE, RMSE, FC
Some references are not correctly cited in the text (l. 39)
 181: please explain where this value of 159 comes from.
 378388: this analysis is interesting and could be completed /made more specific by making explicit reference to the variables influencing the flux variations (Fig. 3).
Tables 4 and 5, and Fig. 5 are somehow redundant. Figure 5 can probably be moved to an appendix.
Figure 6 is illdesigned. Please use the same scale for x and y axes and enlarge to be sure to include all the data.
Figure 8 is not very clear: I guess that the reconstructed values (red) are masked by the observations (purple) when both are present. Plotting the reconstructed values above the observed ones would probably make it clearer.
Citation: https://doi.org/10.5194/egusphere20232685RC1 
RC2: 'Comment on egusphere20232685', Anonymous Referee #2, 19 Feb 2024
reply
The paper compares a few machine/deep learning (ML/DL) methods for imputing sensible (H) and latent (L) heat fluxes. The driving variables used for this purpose are meteorological variables such as temperature. Overall, the results are quite promising in that one of the DL methods used in the paper is quite good. But the paper is not ready for publication yet.
Major comments:
(1) The methodology is not described adequately. All the ML/DL models have several hyperparameters that need to be carefully selected to develop an optimal model. For example, how many LSTM layers were used? How many neurons in each LSTM layer? What was the learning rate used? There is some randomness in the training of DL models; therefore, several models (810) are developed with different random seeds. Was this method adopted in the paper? I know that randomness can be quite significant at least for LSTMs. As another example, how were the widths determined in convolutional layers? These considerations are really important.
(2) RF has been used for feature selection. Why? If there are some redundant features, those would be taken care of by the ML/DL algorithms except the KNN method. Also, why not include RF as one of the ML algorithms to impute the H and L values? RF is surely better than the KNN method. Also, note that RF can be thought of as an adaptive KNN so it seems better to used RF than KNN.
(3) The driving variables in this study also had missing values which were imputed using the KNN method. First of all, the accuracy of the KNN method in imputing these driving variables need to be established. Second, why not try other methods such as random forest for imputing the driving variables? Also, why select 3 nearest neighbors? Were other combinations tried? The method to compute distance in the KNN approach has not been described either.
(4) The methodology for testing the different DL/ML methods is not rigorous enough. A total of 10 years of data are used where 9 years of the data are used for training and 1 year (year 2012) is used for testing. This methodology should be repeated for each year as the testing year in iteration. Basically, use the data from 2007 as the test data and rest of the data for training. Then, used 2008 as testing year and rest as training, and so on.
(5) The generalizability of the results is unclear. Mainly several different meteorological variables are used as driving variables. Would these variables be available for other sites also? If no, the method cannot be generalized. It would be nice to see the model performance with different sets of predictor variables Say, if only surface temperature data are available as driving variable, how good would be the imputation?
I have other specific comments in the attached pdf.
Viewed
HTML  XML  Total  BibTeX  EndNote  

157  33  10  200  2  4 
 HTML: 157
 PDF: 33
 XML: 10
 Total: 200
 BibTeX: 2
 EndNote: 4
Viewed (geographical distribution)
Country  #  Views  % 

Total:  0 
HTML:  0 
PDF:  0 
XML:  0 
 1