A hybrid Kolmogorov-Arnold networks-based model with attention for predicting Arctic River streamflow

Zhou, Renjie; Liu, Shiqi

doi:10.5194/egusphere-2025-3540

Preprints

https://doi.org/10.5194/egusphere-2025-3540

Preprints

08 Sep 2025

| 08 Sep 2025

Status: this preprint is open for discussion and under review for Hydrology and Earth System Sciences (HESS).

A hybrid Kolmogorov-Arnold networks-based model with attention for predicting Arctic River streamflow

Renjie Zhou and Shiqi Liu

Abstract. Arctic rivers represent important components of the Arctic and global hydrological and climate systems, serving as dynamic conduits between terrestrial and marine environments in some rapidly changing regions. They transport freshwater, sediments, nutrients, and carbon from vast watersheds to the Arctic Ocean and affect ocean circulation patterns and regional climate dynamics. Despite their importance, modeling Arctic rivers remains challenging because of sparse data networks, unique cryospheric dynamics, and complex responses to hydrometeorological variables. In this study, a novel hybrid deep learning model is developed to address these challenges and predict Arctic River discharge by incorporating Kolmogorov-Arnold Networks (KAN), Long Short-Term Memory, and the attention mechanism with seasonal trigonometry encoding and physics-based constrains. It integrates several novel components: 1) The KAN-based deep learning component learns and captures intricate temporal patterns from nonlinear hydrometeorological data; 2) Explicit physical constrains designed for the characteristics of permafrost-dominated watersheds govern snow accumulation and melt processes through the architectural design and loss function; 3) The seasonal variations are accounted for using trigonometry functions to represent cyclical patterns; 4) A residual compensation stricture allows the proposed model to revisit systematic errors in initial predictions and helps capture complex nonlinear processes that are not fully represented. The Kolyma River, which is significantly dominated by the permafrost, is adopted to test the performance of the newly developed model. It obtains more robust and accurate predictive performance compared to baseline models. The role of physical constraints, the residual compensated architecture, and the trigonometry encoding are assessed by ablation analysis. The results indicate that these components positively contribute to improving the predictive performance. This novel approach addresses the unique challenges of hydrological forecasting in cold, permafrost-dominated regions and provides a robust framework for predicting Arctic River discharge under changing climate conditions.

Received: 24 Jul 2025 – Discussion started: 08 Sep 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Renjie Zhou and Shiqi Liu

Status: open (until 24 Nov 2025)

Post a comment Subscribe to comment alert

RC1:
'Comment on egusphere-2025-3540', Anonymous Referee #1, 21 Oct 2025 reply

Zhou and Liu present a novel approach for a data-driven model for discharge modelling. It is based on a Kolmogorov-Arnold network combined with a Long-Short Term Memory (LSTM) model, an attention mechanism that includes a trigonometric depiction of seasonal patterns, as well as a physics-based constrain. The newly developed model aimed at improving the prediction of discharge within arctic areas with their special characteristics like perma frost and accumulation and melting of snow over longer periods. Therefore, the model was applied to the discharge data of the Kolyma River in Siberia and the prediction evaluated against the predictions of several other simpler models.
I have found the presented modelling approach to be a novel and valuable contribution to the hydrological modelling community. I believe it to be fitting for the scope of the Journal. However, the presented manuscript needs work regarding the methodology section as well as the discussion.

Major comments:

Line 30: I can't really support the statement that the presented framework is (better) suited for predicting Arctic River discharge under changing climate conditions. It is well likely that climate change impacts the respective catchments in a way that the general behaviour changes - which also alters how discharge forms. I then get to a model space where the model has to extrapolate - which data-driven models are unsuited for.

Line 137, Figure 1: I personally don't think the figure to be well chosen, as the important aspects are missing. I would rather use a fogure that shows the catchment itself with its topography.

Line 138-143: These lines are unnecessary here and probably can be deleted. All those things have already been said within the introduction and are explained over the methodology section anyways.

Line 144-145: All steps, that are necessary for actual model runs should come after the model description. Otherwise, the order is confusing.

Line 146-164: The description of the whole model structure should be done after the individual parts are explained. Figure 2 also should be moved there.
I do recommend the inclusion of an additional efficiency measure like KGE, that is complementary to the other ones and also incorporates different aspects of the discharge like bias for example. Please also cite and mention, which version of the KGE you use then.
Why does the methodology end here? Important parts that come up later within the results part are missing. The methodology should explain that the final model is compared to certain baseline models and how they distinguish from the new model presented here. Furthermore, the whole part is missing about how the model is trained on the data, with how many runs, ending criterion, hyper parameters and so on.

Line 323-327: This is methodology and should not be within the results part - as it is missing within the methods section.

Line 328-329: As mentioned earlier, the baseline models cannot be newly introduced within the results.

Line 343-344: You can't conduct boxplots. Do you mean you conducted the model application 10 times?

Line 357: Figure 6 y axis seems to be cut, the numbers are partly missing

Line 361: I dont see how this represents the "spectrum of hydrological variability". From my understanding, it is more of a possibility to see, how the model performs if the data is only available in lesser resolution. How does this assess the depiction of the hydrological variability?

Line 405: Figure 8, are these for a aggregation period of 1 month?

Line 407-415: This is all methodology and not results.

Line 437-448: I dont think this part is really necessary here. The conclusion is not a whole summary of the paper, but points out the key findings again.

Line 455-456: The river discharge has a long memory? The sentence does not make sense. I feel like there is a more thorough discussion necessary of why the model shows this behaviour regarding the model efficiency for different aggregation periods - where the reason must be within model structure and how it fits the discharge pattern over time.

I generally feel like the discussion part is lacking depth. While I personally recommend to separate results and discussion, you can keep both together if it makes sense overall. But in the current state, the results lack depth regarding the explanation of observed model behaviour. For example, line 462-463: has this been the same for the application of other models? Is this a common problem? Like this, a few more citations and comparisons to other studies would help putting the paper within a broader context.
Also, I am currently missing a graphical depiction of the gauging curve and the simulated discharge. I believe a figure for that would help to give the reader an idea of how the model behaves, where it might deviate from gauging data and where it is strongly in congruence with it.

Minor comments:
Line 22: structure
Line 24: dominated by permafrost
Line 27: ...that these components improve the predictive performance.
Line 46: These temperature dependent transitions...?
Line 128-129: Why is there no citation for the Dataset?
Line 178: 1) Input expansion
Line 183-185: Kolmogorov-Arnold theorem while avoiding the computational overhead
Line 195: GELU
Line 196: Figure 3 not referenced within the text.
Line 200: ...mechanism and a hidden state, an LSTM can efficiently regulate...
Line 209: The memory cell of an LSTM is primarily composed...
Line 240: "Q refers the discharge prediction using the context vector calculated from the context vector." It has to be "refers to" and what is "using the context vector calculated from the context vector" supposed to mean?
Line 273: I recommend a semicolon after water.
Line 279: caused by sources, such as model simplifications...
Line 285-286: Maybe its better to reformulate the sentence and describe alpha and beta as parameters that have to be fitted through model application?
Line 299: beneficial
Line 303-304: What is cited here? The Nash-Sutcliffe efficiency measure should be properly cited.
Line 330: I would recommend to implement the name RCPIKLA of the new model earlier, instead of within the results.
Line 396: change "better captures"

Reply

Citation: https://doi.org/10.5194/egusphere-2025-3540-RC1
- RC2:
  'Reply on RC1', Anonymous Referee #2, 22 Nov 2025 reply
  Manuscript: A hybrid Kolmogorov-Arnold networks-based model with residual compensation and physics-informed constraints for Arctic River discharge prediction
  Journal: Hydrology and Earth System Sciences (HESS)
  Comments:
  1. One of the primary advantages of Kolmogorov-Arnold Networks is their enhanced interpretability compared to traditional MLPs. KAN is usually used to improve the interpretability of the relations between inputs and output, but there is no mention of that.
  
  The manuscript fails to leverage or discuss this fundamental strength of KAN architecture. Specifically, there is no:
  Visualization of the learned univariate functions
  
  Symbolic regression analysis
  
  Interpretation of what relationships the KAN component discovered between hydrometeorological inputs and Arctic discharge
  
  Physical insights into the processes governing snowmelt-driven streamflow in permafrost regions
  
  Include a dedicated subsection on KAN interpretability analysis containing:
  Visualization of learned activation functions for key input-output relationships
  
  Symbolic approximations of these functions where feasible (using symbolic regression tools available in KAN libraries)
  
  Physical interpretation of discovered patterns in the context of Arctic hydrology
  
  Comparison with known physical relationships in snowmelt hydrology from the literature
  
  2. what are the hyperparameters (epochs, batch size, learning rate) and details of the architecture of the RNN, GRU and other neural nets used for comparison.
  
  The manuscript lacks essential details for all baseline models (RNN, GRU, LSTM):
  No specification of hyperparameters (epochs, batch size, learning rate)
  
  No architectural details (number of layers, hidden units, activation functions)
  
  No information about initialization methods
  
  No training procedure details (optimizer type, learning rate schedules, dropout rates)
  
  No stopping criteria or early stopping procedures
  
  No hardware specifications or training times
  
  3. Recent papers suggest that KAN based architectures outperform classical ANN based architectures. There should have been a comparison with KAN based LSTM, GRU and other neural nets. The manuscript only compares RCPIKLA (which uses KAN) against traditional ANN-based models (RNN, GRU, LSTM), not against KAN-enhanced versions of these baseline architectures.
  
  The comparison with no physics informed constraints and no residual has been compared. However, the current experimental design still creates an attribution problem. Observed performance improvements could stem from:
  The KAN component specifically
  
  The attention mechanism
  
  The physics-informed constraints
  
  The residual compensation structure
  
  Seasonal trigonometric encoding
  
  Some synergistic combination of these components
  
  Without proper ablation comparing LSTM-attention/KAN-LSTM/KAN-GRU versus RCPIKLA, the specific contribution of KAN remains unclear.
  4. The manuscript describes a physics-informed constraint that imposes an upper limit on predicted snowmelt contribution but does not explain the asymmetric treatment of constraint violations.
  
  The asymmetric design requires clear physical justification:
  Upper bound rationale: Snowmelt contribution physically cannot exceed available snow water equivalent - this is a hard constraint based on mass conservation
  
  Lower bound question: Are underpredictions physically plausible? Could incomplete melting, refreezing, or sublimation make them valid? Or do they indicate model failure to capture melt processes?
  
  Bias implications: Does the asymmetric penalty introduce systematic bias toward underprediction?
  
  5. Physics-informed neural networks fundamentally rely on balancing multiple loss terms through weighting parameters. The manuscript mentions α and β as weights for MSE loss and physics loss but does not report their values.
  
  The manuscript must provide:
  Final α and β values used for all reported results
  
  Scenarios of hit and trials
  
  Search space explored
  
  6. The manuscript lacks visualization of epoch-wise loss decomposition, which is important for assessment of convergence of all models. Without this analysis, it is impossible to assess whether the physics constraint meaningfully guides training or becomes negligible compared to the data-driven MSE loss.
  Visualizing separate loss components reveals:
  Whether physics loss actually contributes to training or is overwhelmed by MSE loss
  
  Training stability and convergence behavior
  
  Potential issues: loss spikes, plateaus, phase transitions
  
  7. Figure 6 (left): "y axis seems to be cut, the numbers are partly missing" - this affects readability and interpretation. Also, please check for spelling and grammatical errors throughout manuscript. Like a few spelling mistakes have been observed in abstract.
  8. The physics-informed mechanism involves snow storage (S_t) and melt (M_t) terms that evolve over time. However, the manuscript does not specify:
  Initial values for S_0 and M_0 at the start of the simulation period
  
  How these initial conditions were integrated into the model?
  
  9. It is mentioned conducting 10 independent runs but provides unclear or incomplete reporting of variability in results. Fig8 represents the rmse and nse RCPIKLA variants with all predictions, what is the average RMSE over 10 runs, how much variation is observed over independent runs?
  
  Additionally:
  Figure 8 shows results (RMSE and NSE for RCPIKLA variants) but it's unclear whether these represent single runs, mean values, or distributions
  
  No explicit reporting of mean ± standard deviation for performance metrics
  
  No statistical significance testing comparing model variants
  
  10. Figure 5 currently shows model predictions at 12 time intervals (representing different aggregation windows) but does not convey prediction uncertainty across the 10 independent runs. This limits the reader's ability to assess:
  Model reliability at different temporal scales
  
  Whether certain aggregation intervals show higher prediction variance
  
  Summary:
  The manuscript is recommended for publication if the above suggestions are addressed or answered.
  
  Reply
  
  Citation: https://doi.org/10.5194/egusphere-2025-3540-RC2

Renjie Zhou and Shiqi Liu

Viewed

Total article views: 1,971 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,880	76	15	1,971	26	15

HTML: 1,880
PDF: 76
XML: 15
Total: 1,971
BibTeX: 26
EndNote: 15

Views and downloads (calculated since 08 Sep 2025)

Month	HTML	PDF	XML	Total
Sep 2025	1,751	10	5	1,766
Oct 2025	112	35	7	154
Nov 2025	17	31	3	51

Cumulative views and downloads (calculated since 08 Sep 2025)

Month	HTML	PDF	XML	Total
Sep 2025	1,751	10	5	1,766
Oct 2025	112	35	7	154
Nov 2025	17	31	3	51

Viewed (geographical distribution)

Total article views: 2,078 (including HTML, PDF, and XML) Thereof 2,078 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 22 Nov 2025

Short summary

This research develops a novel deep learning framework to predict streamflow in Arctic rivers dominated by frozen ground. This framework combines multiple advanced deep learning techniques with physics-based understanding of snow and ice processes. Tested on Russia's Kolyma River, it obtained more robust and accurate performance compared to existing methods, with each component contributing to improved performance.


Total:	0
HTML:	0
PDF:	0
XML:	0