the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.
the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.
Conditional updates of neural network weights for increased out of training performance
Abstract. This study proposes a method to enhance neural network performance when training data and application data are not very similar, e.g., out of distribution problems, as well as pattern and regime shifts. The method consists of three main steps: 1) Retrain the neural network towards reasonable subsets of the training data set and note down the resulting weight anomalies. 2) Choose reasonable predictors and derive a regression between the predictors and the weight anomalies. 3) Extrapolate the weights, and thereby the neural network, to the application data. We show and discuss this method in three nonlinear use cases from the climate sciences, which include successful temporal, spatial and cross-domain extrapolations of neural networks.
Status: open (until 20 Apr 2026)
- RC1: 'Comment on egusphere-2026-728', Anonymous Referee #1, 12 Mar 2026 reply
Viewed
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 125 | 0 | 1 | 126 | 0 | 0 |
- HTML: 125
- PDF: 0
- XML: 1
- Total: 126
- BibTeX: 0
- EndNote: 0
Viewed (geographical distribution)
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This manuscript describes an approach to improve neural network (NN) predictions in settings where the data on which the NN is trained differ (substantially) from those in the setup to which the NN will ultimately be applied. This situation is frequently encountered in earth sciences (and presumably other areas of application), and thus this research topic is highly relevant to applications.
General comment:
The proposed approach to address this issue is more like a general procedure than a specific method, with many modeling choices being left to the user depending on the setup in which this approach is applied. I understand that this is inevitable, but I would then expect a solid theoretical motivation of the proposed approach that gives a good understanding when and why it works. I feel that this is still lacking in the current version of the manuscript. While the three use cases are good examples to demonstrate that the proposed approach can yield notable improvements while also discussing shortcomings and open questions, I must admit that I am somewhat puzzled that the proposed procedure works at all. In the introduction, the authors motivate this research by explaining that in non-linear problems, a systematic shift in the input data does not easily translate to a corresponding shift in the output. But isn't this also true for the NN weights? In my understanding, NN weights are typically not even identifiable, i.e. completely different weights can yield the same (or at least very similar) output. So, how is it possible that that these weights can be extrapolated by (e.g. linear) methods in a stable way? Intuitively, I would expect this to work only if the weights of ParentModeli are guaranteed to remain relatively close to those of ParentModel0, but it doesn't seem that the authors tried to control this in any way. Please provide more motivation and explanation (which could e.g. include a more detailed analysis and visualization of the weight extrapolation underlying the different curved in Fig. 1) of why you expect this approach to work in general, or, which conditions have to be satisfied for this weight extrapolation to work. If any tuning parameters were involved in getting the positive results shown in the three use cases, it would be useful to share this experience to help guide the application of this approach in different settings.
Specific comments:
4.1, 2nd paragraph, 'Note, however, that the sensitivity ...': Can you explain this statement further, i.e., in what way do the activation functions affect the sensitivity of the weights? More generally, this entire paragraph seems to assume good familiarity with this type of problem and is somewhat short on details.
4.1, 3rd paragraph, 'As there is a clear order ...': This sounds like a substantial departure from the proposed approach. Doesn't this imply that instead of anomalies from ParentModel0, the weight regression is applied to the increments corresponding to subsequent time points. Please clarify.
4.1, 3rd paragraph, '... every weight and bias of the CNN ...': So, including the weights of the convolution layers? It is again surprising and somewhat counter-intuitive to me that weights of a convolution layer can be (linearly) extrapolated in a meaningful way. Is there any way to visualize the evolution of just the first convolution weights over time, and the associated extrapolation to after the tipping event? Is it possible to quantify in which layer the most impactful extrapolation happens, that improves the NNs performance during and at the tipping event?
4.2, 1st paragraph, '... only individual xi': Can you explain this further, i.e. a) clarify what you mean by individual xi b) what you mean by regularization towards the entire training data set (I believe this has never been explained in detail) and c) why you chose a different approach here.