Stable Stream Temperature Prediction for Different Basins Using Time Series Encoding and Temporal Convolutional Networks
Abstract. Flow temperature prediction is essential for assessing the health of river ecosystems. Water temperature data sets are often provided inconsistently in tasks that predict river water temperatures in different river basins, especially in different climatic regions. At the same time, spatial heterogeneity within different river basins significantly complicates water temperature prediction, which makes it challenging to establish a water temperature prediction model with strong generalization capabilities and stable prediction results. To solve this problem, the moving average encoding and DOY encoding of time series data into the time convolutional network model have been merged, thus constructing a time convolutional network model for time series data encoding (time-limited-TCN). The model effectively captured multimodal features of dynamic water temperature data from complex random time series, subsequently producing stable prediction results in different river basins. Thirteen hydrographic stations across four Bardeen rivers (Thames, Colorado, Mississippi and Sacramento) were used to test the proposed improved pre-temporal-TCN model and compare its performance with reference models (Air2Stream, Narx, Gru and Gboost). The results showed that the enhanced characteristics performed well in the river in the presence of human intervention, and that air temperature and DOY were important variables that influenced water temperature prediction. The proposed improved model shows that in cross-water water temperature prediction tasks, more stable and accurate prediction performance (average RMSE on the test set of at least 8.7 % better than the comparison model. Taking into account the characteristics and model performance, the proposed model should be a promising approach for the reconstruction of flow temperatures in several river basin data accumulation areas.
This manuscript addresses the important problem of stream water temperature prediction across multiple river basins, a task that is highly relevant for river ecosystem assessment and management. The authors propose an enhanced temporal convolutional network (TimENC-TCN) that incorporates moving average encoding and day-of-year (DOY) encoding to improve model generalization under data-scarce and heterogeneous conditions. The model is evaluated using observations from multiple hydrological stations across four major river basins and is compared against several established reference models. Overall, the study tackles a timely and challenging topic and presents a modeling framework with potential applicability to cross-basin stream temperature prediction. This is a good study with appropriate methods, and the manuscript is generally clear and easy to follow. I believe it should be suitable for publication once the following issues are addressed.
Major: Several key conclusions in the manuscript appear insufficiently supported by the presented results. For example, the statement on line 273—“Therefore, it is reasonable to infer that at these stations where performance has declined, human factors have masked the effects of natural factors”—does not appear fully justified. This inference does not hold consistently across stations in the Sacramento River. Specifically, station J (RMSE=0.969 ºC) contradicts this pattern, and the RMSE values at stations H (1.29 °C) and B (1.189 °C) are quite similar, undermining a clear distinction. Additionally, the manuscript lacks clarity on the precise locations of stations I (Verona) and the other two Sacramento River stations, which is necessary to assess the validity of this conclusion.
Similarly, the claim on line 285—that the TimENC-TCN model demonstrates better ability to handle spatial heterogeneity in basins influenced by natural factors compared to those influenced by human factors—relies heavily on observations from a single station. This limits the strength of the conclusion. Moreover, the lower model performance at this station might be due to other factors such as measurement errors in water or air temperature data, or differences in input data volume, rather than human influence alone. Overall, clarifications on station locations, more consistent evidence across multiple stations, and a cautious interpretation of results are needed to strengthen these conclusions.
Minor:
Line 43- 45. I suggest rephrasing the sentence in lines 43–45 by removing the initial “And” in the second sentence for better flow.
Line 75: I suggest including a figure showing the locations of the stations to help readers visualize the geographical differences among them. This would clarify how the stations’ distinct environments may influence the results.
Line 86, Table 1: I suggest including the watershed area for each station to provide additional context on the catchment characteristics.
Line 100: In the Methods section, it is important to include the approach used to evaluate model performance with gradual sample removal, detailing how this analysis was conducted.
Line 190: References are missing for the developers of Air2Stream and the other models mentioned. Please include appropriate citations to acknowledge the original sources.
Line 259: The table showing “Option Feature input Feature output” appears to be missing a caption. I believe this is Table 3 and recommend adding an appropriate caption for clarity.
Line 315: The authors tested the model’s generalization performance by removing samples at Purfleet station (total data volume = 5718). While this approach is appropriate, the study’s conclusions would be strengthened by including at least one additional station in the generalization analysis, such as Cisco station (total data volume = 2100).
Line 337: I suggest rephrasing this sentence to: “Meanwhile, the case of the Verona station and experiments on the drivers of stream temperature changes (Alger et al., 2021; Wade et al., 2023) indicate that introducing other features might be necessary at stations with significant human interference.”
For Figures 2, 5, 7, and 8, I suggest increasing the font size of the axis values to improve readability.