Preprints
https://doi.org/10.5194/egusphere-2026-272
https://doi.org/10.5194/egusphere-2026-272
29 Jan 2026
 | 29 Jan 2026
Status: this preprint is open for discussion and under review for Hydrology and Earth System Sciences (HESS).

Hydrochemistry and modeling nitrate concentration in farmland groundwater under different hydrological seasons by integrating hybrid quantum-classical ML, virtual sample generation and AlphaEarth Foundation

Junjie Xu, Xin Wei, Yilei Yu, Lihu Yang, Yuanzheng Zhai, Cuicui Lv, and Xianfang Song

Abstract. Precise seasonal prediction of groundwater nitrate concentrations in intensive agricultural areas faces challenges such as data sparsity, strong spatiotemporal heterogeneity, and complex hydro-biogeochemical processes. To address these issues, this study proposes an integrated prediction framework combining hybrid quantum-classical machine learning, advanced virtual sample generation (t-SNE-GMM-KNN), and remote sensing foundation model semantic embedding (AEF). Modeling was conducted across the 20222023 normal, dry, and wet seasons in Xiong'an New Area. Hydrochemical types were dominated by Ca-Mg-HCO3, controlled by mineral dissolution and evaporation. Nitrate concentrations were highest in the dry season (mean 42.93 mg L1), driven by evaporative concentration. Spatially, high-value zones shifted: southeast (normal), central (dry), and northwest (wet). MixSIAR modeling based on isotopes indicated domestic sewage and livestock manure (74.1 %) as dominant sources. The t-SNE-GMM-KNN strategy mitigated small-sample bias while preserving nonlinear structure. When virtual samples were augmented to 10-fold, the Random Forest R2 in the dry season increased from 0.284 to > 0.85. Furthermore, a hybrid quantum-classical Random Forest exhibited superior robustness for data sparsity, achieving peak performance in the normal season (R2 = 0.962, RMSE = 5.73 mg L1). Additionally, using only AEF embeddings achieved screening-level accuracy (R2 up to 0.860), providing a feasible rapid survey scheme for extensive unmonitored regions. Correlation analysis identified TDS and EC as persistent top predictors (r > 0.8). This comprehensive framework offers a robust solution for seasonal nitrate prediction and sustainable water management.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Junjie Xu, Xin Wei, Yilei Yu, Lihu Yang, Yuanzheng Zhai, Cuicui Lv, and Xianfang Song

Status: open (until 12 Mar 2026)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Junjie Xu, Xin Wei, Yilei Yu, Lihu Yang, Yuanzheng Zhai, Cuicui Lv, and Xianfang Song
Junjie Xu, Xin Wei, Yilei Yu, Lihu Yang, Yuanzheng Zhai, Cuicui Lv, and Xianfang Song

Viewed

Total article views: 34 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
29 5 0 34 1 0 1
  • HTML: 29
  • PDF: 5
  • XML: 0
  • Total: 34
  • Supplement: 1
  • BibTeX: 0
  • EndNote: 1
Views and downloads (calculated since 29 Jan 2026)
Cumulative views and downloads (calculated since 29 Jan 2026)

Viewed (geographical distribution)

Total article views: 33 (including HTML, PDF, and XML) Thereof 33 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 30 Jan 2026
Download
Short summary
The proposed t-SNE-GMM-KNN virtual sample generation boosts dry season R2 from 0.28 to > 0.85, preserving multimodal structure. Total Dissolved Solids (TDS), Electrical Conductivity (EC), and Salinity are consistently identified as the top predictive factors across different hydrological seasons.
Share