Use of nonlinear principal components of CHIRPS precipitation data and ocean-atmospheric variables for streamflow forecasting in an area of scarce data. Case study, Tocar&iacute;a river basin &ndash; Orinoquia Colombiana

Sarria-Ospina, Jhon Derly; Ocampo-Marulanda, Camilo; Ceron-Aramburo, Lina Maria; Canchala, Teresita; Ferreira, Tiago Alessandro

doi:10.5194/egusphere-2025-3694

Preprints

https://doi.org/10.5194/egusphere-2025-3694

Preprints

29 Sep 2025

| 29 Sep 2025

Use of nonlinear principal components of CHIRPS precipitation data and ocean-atmospheric variables for streamflow forecasting in an area of scarce data. Case study, Tocaría river basin – Orinoquia Colombiana

Jhon Derly Sarria-Ospina, Camilo Ocampo-Marulanda, Lina Maria Ceron-Aramburo, Teresita Canchala, and Tiago Alessandro Ferreira

Abstract. Accurate streamflow forecasting is critical for mitigating the impacts of hydrological extremes and guiding sustainable water resource management, particularly in poorly gauged tropical catchments. This study presents a hybrid forecasting framework that integrates Neural Network Seasonal Autoregressive Integrated Moving Average using exogenous variables (NN-SARIMAX) models with nonlinear principal components (NLPCs) derived from CHIRPS precipitation data, and large-scale ocean–atmosphere indices (macroclimatic variables, MVs). Four monthly models were developed and tested for the Tocaría River basin in the Colombian Orinoquía region: (1) a baseline SARIMA (4,0,4) (0,0,3)₁₂ model; (2) SARIMAX with exogenous MVs; (3) NN-SARIMAX with NLPCs; and (4) a hybrid NN-SARIMAX combining both MVs and NLPCs. The hybrid model achieved the best performance with an R² of 0.78 during the validation period. These results underscore the effectiveness of integrating local precipitation variability and large-scale climatic drivers to enhance forecast accuracy under data-scarce conditions. The proposed methodology offers a transferable approach for operational forecasting in ungauged or sparsely monitored basins, contributing to early warning systems, drought preparedness, and adaptive water governance in vulnerable tropical regions.

Received: 30 Jul 2025 – Discussion started: 29 Sep 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 3063 KB)

Supplement (78 KB)

Download & links

Jhon Derly Sarria-Ospina, Camilo Ocampo-Marulanda, Lina Maria Ceron-Aramburo, Teresita Canchala, and Tiago Alessandro Ferreira

Status: final response (author comments only)

RC1:
'Comment on egusphere-2025-3694', Anonymous Referee #1, 26 Oct 2025
Dear Authors,
Thank you very much for submitting your research article to NHESS. The manuscript addresses an important topic and is clearly written, with a coherent and logical flow. However, there are a few minor issues that need to be addressed, as outlined below.
Section 2.5:
It would be beneficial to provide a detailed explanation of the Neural Network parameters and the architecture of the NN-SARIMAX model to help readers easily and clearly understand the structure of the neural network.

Section 3.2.1
It is stated that two non-linear principal components (explaining 92.5% and 7.4% of the variance, respectively) were selected out of a total of 81 components, collectively explaining 99.9% of the variance. However, this approach may lead to overfitting, as it effectively considers nearly the entire variation unless the model is validated through cross-validation or other model selection criteria (e.g., AIC, BIC, etc.) to determine the optimal number of components. Therefore, it should be clearly explained how potential overfitting was assessed and mitigated.

It is also unclear why only the first two principal components account for 99.9% of the variance, while the remaining 79 components contribute only 0.1%. This large discrepancy warrants further clarification.

Section 2.4.3:
The approach used to address multicollinearity is generally sound and viable. However, there is no clear evidence indicating that the multicollinearity issue has been fully resolved, such as through recalculating the Variance Inflation Factor (VIF) after the iterative removal of collinear and less important predictors. Many of the retained variables still exhibit extremely high VIF values (e.g., NINO4 = 29,126; NINO12 = 2,492; NP = 17,705; TNA = ∞; and TSA = ∞). It remains unclear whether multicollinearity persists among these nine predictors or not.
Citation: https://doi.org/10.5194/egusphere-2025-3694-RC1
RC2: 'Comment on egusphere-2025-3694', Anonymous Referee #2, 20 Nov 2025

See the attached review.

Citation: https://doi.org/10.5194/egusphere-2025-3694-RC2

Jhon Derly Sarria-Ospina, Camilo Ocampo-Marulanda, Lina Maria Ceron-Aramburo, Teresita Canchala, and Tiago Alessandro Ferreira

Supplement

https://doi.org/10.5194/egusphere-2025-3694-supplement

Data sets

The climate hazards infrared precipitation with stations – a new environmental record for monitoring extremes Chris Funk et al. https://doi.org/10.1038/sdata.2015.66

Jhon Derly Sarria-Ospina, Camilo Ocampo-Marulanda, Lina Maria Ceron-Aramburo, Teresita Canchala, and Tiago Alessandro Ferreira

Viewed

Total article views: 502 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
365	122	15	502	22	9	10

HTML: 365
PDF: 122
XML: 15
Total: 502
Supplement: 22
BibTeX: 9
EndNote: 10

Views and downloads (calculated since 29 Sep 2025)

Month	HTML	PDF	XML	Total
Sep 2025	159	6	1	166
Oct 2025	118	14	5	137
Nov 2025	55	26	8	89
Dec 2025	30	73	1	104
Jan 2026	3	3	0	6

Cumulative views and downloads (calculated since 29 Sep 2025)

Month	HTML	PDF	XML	Total
Sep 2025	159	6	1	166
Oct 2025	118	14	5	137
Nov 2025	55	26	8	89
Dec 2025	30	73	1	104
Jan 2026	3	3	0	6

Viewed (geographical distribution)

Total article views: 487 (including HTML, PDF, and XML) Thereof 487 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 08 Jan 2026

Short summary

A method was developed to improve river flow predictions in tropical regions with limited measurements. By combining local rainfall patterns with global climate signals through a hybrid machine learning model, the forecasting accuracy was enhanced. This approach can support water managers in making more informed decisions during dry periods. The proposed method offers a straightforward way to enable early warnings in data-scarce regions.


Total:	0
HTML:	0
PDF:	0
XML:	0