A hybrid Kolmogorov-Arnold networks-based model with attention for predicting Arctic River streamflow
Abstract. Arctic rivers represent important components of the Arctic and global hydrological and climate systems, serving as dynamic conduits between terrestrial and marine environments in some rapidly changing regions. They transport freshwater, sediments, nutrients, and carbon from vast watersheds to the Arctic Ocean and affect ocean circulation patterns and regional climate dynamics. Despite their importance, modeling Arctic rivers remains challenging because of sparse data networks, unique cryospheric dynamics, and complex responses to hydrometeorological variables. In this study, a novel hybrid deep learning model is developed to address these challenges and predict Arctic River discharge by incorporating Kolmogorov-Arnold Networks (KAN), Long Short-Term Memory, and the attention mechanism with seasonal trigonometry encoding and physics-based constrains. It integrates several novel components: 1) The KAN-based deep learning component learns and captures intricate temporal patterns from nonlinear hydrometeorological data; 2) Explicit physical constrains designed for the characteristics of permafrost-dominated watersheds govern snow accumulation and melt processes through the architectural design and loss function; 3) The seasonal variations are accounted for using trigonometry functions to represent cyclical patterns; 4) A residual compensation stricture allows the proposed model to revisit systematic errors in initial predictions and helps capture complex nonlinear processes that are not fully represented. The Kolyma River, which is significantly dominated by the permafrost, is adopted to test the performance of the newly developed model. It obtains more robust and accurate predictive performance compared to baseline models. The role of physical constraints, the residual compensated architecture, and the trigonometry encoding are assessed by ablation analysis. The results indicate that these components positively contribute to improving the predictive performance. This novel approach addresses the unique challenges of hydrological forecasting in cold, permafrost-dominated regions and provides a robust framework for predicting Arctic River discharge under changing climate conditions.
Zhou and Liu present a novel approach for a data-driven model for discharge modelling. It is based on a Kolmogorov-Arnold network combined with a Long-Short Term Memory (LSTM) model, an attention mechanism that includes a trigonometric depiction of seasonal patterns, as well as a physics-based constrain. The newly developed model aimed at improving the prediction of discharge within arctic areas with their special characteristics like perma frost and accumulation and melting of snow over longer periods. Therefore, the model was applied to the discharge data of the Kolyma River in Siberia and the prediction evaluated against the predictions of several other simpler models.
I have found the presented modelling approach to be a novel and valuable contribution to the hydrological modelling community. I believe it to be fitting for the scope of the Journal. However, the presented manuscript needs work regarding the methodology section as well as the discussion.
Â
Major comments:
Â
Line 30: I can't really support the statement that the presented framework is (better) suited for predicting Arctic River discharge under changing climate conditions. It is well likely that climate change impacts the respective catchments in a way that the general behaviour changes - which also alters how discharge forms. I then get to a model space where the model has to extrapolate - which data-driven models are unsuited for.
Â
Line 137, Figure 1: I personally don't think the figure to be well chosen, as the important aspects are missing. I would rather use a fogure that shows the catchment itself with its topography.
Â
Line 138-143: These lines are unnecessary here and probably can be deleted. All those things have already been said within the introduction and are explained over the methodology section anyways.
Â
Line 144-145: All steps, that are necessary for actual model runs should come after the model description. Otherwise, the order is confusing.
Â
Line 146-164: The description of the whole model structure should be done after the individual parts are explained. Figure 2 also should be moved there.
I do recommend the inclusion of an additional efficiency measure like KGE, that is complementary to the other ones and also incorporates different aspects of the discharge like bias for example. Please also cite and mention, which version of the KGE you use then.
Why does the methodology end here? Important parts that come up later within the results part are missing. The methodology should explain that the final model is compared to certain baseline models and how they distinguish from the new model presented here. Furthermore, the whole part is missing about how the model is trained on the data, with how many runs, ending criterion, hyper parameters and so on.
Â
Line 323-327: This is methodology and should not be within the results part - as it is missing within the methods section.
Â
Line 328-329: As mentioned earlier, the baseline models cannot be newly introduced within the results.
Â
Line 343-344: You can't conduct boxplots. Do you mean you conducted the model application 10 times?
Â
Line 357: Figure 6 y axis seems to be cut, the numbers are partly missing
Â
Line 361: I dont see how this represents the "spectrum of hydrological variability". From my understanding, it is more of a possibility to see, how the model performs if the data is only available in lesser resolution. How does this assess the depiction of the hydrological variability?
Â
Line 405: Figure 8, are these for a aggregation period of 1 month?
Â
Line 407-415: This is all methodology and not results.
Â
Line 437-448: I dont think this part is really necessary here. The conclusion is not a whole summary of the paper, but points out the key findings again.
Â
Line 455-456: The river discharge has a long memory? The sentence does not make sense. I feel like there is a more thorough discussion necessary of why the model shows this behaviour regarding the model efficiency for different aggregation periods - where the reason must be within model structure and how it fits the discharge pattern over time.
Â
I generally feel like the discussion part is lacking depth. While I personally recommend to separate results and discussion, you can keep both together if it makes sense overall. But in the current state, the results lack depth regarding the explanation of observed model behaviour. For example, line 462-463: has this been the same for the application of other models? Is this a common problem? Like this, a few more citations and comparisons to other studies would help putting the paper within a broader context.
Also, I am currently missing a graphical depiction of the gauging curve and the simulated discharge. I believe a figure for that would help to give the reader an idea of how the model behaves, where it might deviate from gauging data and where it is strongly in congruence with it.
Â
Minor comments:
Line 22: structure
Line 24: dominated by permafrost
Line 27: ...that these components improve the predictive performance.
Line 46: These temperature dependent transitions...?
Line 128-129: Why is there no citation for the Dataset?
Line 178: 1) Input expansion
Line 183-185: Kolmogorov-Arnold theorem while avoiding the computational overhead
Line 195: GELU
Line 196: Figure 3 not referenced within the text.
Line 200: ...mechanism and a hidden state, an LSTM can efficiently regulate...
Line 209: The memory cell of an LSTM is primarily composed...
Line 240: "Q refers the discharge prediction using the context vector calculated from the context vector." It has to be "refers to" and what is "using the context vector calculated from the context vector" supposed to mean?
Line 273: I recommend a semicolon after water.
Line 279: caused by sources, such as model simplifications...
Line 285-286: Maybe its better to reformulate the sentence and describe alpha and beta as parameters that have to be fitted through model application?
Line 299: beneficial
Line 303-304: What is cited here? The Nash-Sutcliffe efficiency measure should be properly cited.
Line 330: I would recommend to implement the name RCPIKLA of the new model earlier, instead of within the results.
Line 396: change "better captures"