Hydrochemistry and modeling nitrate concentration in farmland groundwater under different hydrological seasons by integrating hybrid quantum-classical ML, virtual sample generation and AlphaEarth Foundation
Abstract. Precise seasonal prediction of groundwater nitrate concentrations in intensive agricultural areas faces challenges such as data sparsity, strong spatiotemporal heterogeneity, and complex hydro-biogeochemical processes. To address these issues, this study proposes an integrated prediction framework combining hybrid quantum-classical machine learning, advanced virtual sample generation (t-SNE-GMM-KNN), and remote sensing foundation model semantic embedding (AEF). Modeling was conducted across the 2022–2023 normal, dry, and wet seasons in Xiong'an New Area. Hydrochemical types were dominated by Ca-Mg-HCO3−, controlled by mineral dissolution and evaporation. Nitrate concentrations were highest in the dry season (mean 42.93 mg L−1), driven by evaporative concentration. Spatially, high-value zones shifted: southeast (normal), central (dry), and northwest (wet). MixSIAR modeling based on isotopes indicated domestic sewage and livestock manure (74.1 %) as dominant sources. The t-SNE-GMM-KNN strategy mitigated small-sample bias while preserving nonlinear structure. When virtual samples were augmented to 10-fold, the Random Forest R2 in the dry season increased from 0.284 to > 0.85. Furthermore, a hybrid quantum-classical Random Forest exhibited superior robustness for data sparsity, achieving peak performance in the normal season (R2 = 0.962, RMSE = 5.73 mg L−1). Additionally, using only AEF embeddings achieved screening-level accuracy (R2 up to 0.860), providing a feasible rapid survey scheme for extensive unmonitored regions. Correlation analysis identified TDS and EC as persistent top predictors (r > 0.8). This comprehensive framework offers a robust solution for seasonal nitrate prediction and sustainable water management.