Hydrochemistry and modeling nitrate concentration in farmland groundwater under different hydrological seasons by integrating hybrid quantum-classical ML, virtual sample generation and AlphaEarth Foundation
Abstract. Precise seasonal prediction of groundwater nitrate concentrations in intensive agricultural areas faces challenges such as data sparsity, strong spatiotemporal heterogeneity, and complex hydro-biogeochemical processes. To address these issues, this study proposes an integrated prediction framework combining hybrid quantum-classical machine learning, advanced virtual sample generation (t-SNE-GMM-KNN), and remote sensing foundation model semantic embedding (AEF). Modeling was conducted across the 2022–2023 normal, dry, and wet seasons in Xiong'an New Area. Hydrochemical types were dominated by Ca-Mg-HCO3−, controlled by mineral dissolution and evaporation. Nitrate concentrations were highest in the dry season (mean 42.93 mg L−1), driven by evaporative concentration. Spatially, high-value zones shifted: southeast (normal), central (dry), and northwest (wet). MixSIAR modeling based on isotopes indicated domestic sewage and livestock manure (74.1 %) as dominant sources. The t-SNE-GMM-KNN strategy mitigated small-sample bias while preserving nonlinear structure. When virtual samples were augmented to 10-fold, the Random Forest R2 in the dry season increased from 0.284 to > 0.85. Furthermore, a hybrid quantum-classical Random Forest exhibited superior robustness for data sparsity, achieving peak performance in the normal season (R2 = 0.962, RMSE = 5.73 mg L−1). Additionally, using only AEF embeddings achieved screening-level accuracy (R2 up to 0.860), providing a feasible rapid survey scheme for extensive unmonitored regions. Correlation analysis identified TDS and EC as persistent top predictors (r > 0.8). This comprehensive framework offers a robust solution for seasonal nitrate prediction and sustainable water management.
This manuscript presents a highly innovative and timely contribution to hydrogeoscience research, offering a well-integrated framework that combines hydrochemical analysis, isotopic source apportionment, virtual sample generation, and hybrid quantum–classical machine learning to predict seasonal groundwater nitrate concentrations. The authors successfully link hydrogeochemical processes with predictive modeling and interpretability (via SHAP and Bayesian analyses), thereby enhancing both scientific insight and practical relevance. The focus on seasonal nitrate dynamics in an intensively cultivated region further strengthens the manuscript’s significance for water resources management. Overall, the work aligns exceptionally well with the scope and standards of HESS, particularly in its emphasis on process understanding, methodological novelty, and interdisciplinary integration.
1) The manuscript presents an interesting integration of virtual sample generation and hybrid quantum–classical ML. However, the authors should more explicitly differentiate their contribution from existing ML-based groundwater quality prediction studies. A short paragraph clearly stating what is fundamentally new (beyond combining known techniques) would strengthen the paper.
2) The t-SNE–GMM–KNN augmentation strategy is promising, but the validation of virtual samples is described rather briefly. Please provide additional quantitative diagnostics (e.g., distribution similarity metrics, KS test, or comparison of covariance structures) to demonstrate that synthetic data do not introduce bias.
3) The reported improvement in R² after 10× augmentation is impressive. The authors should discuss potential risks of overfitting to synthetic patterns and comment on how the framework might generalize to other regions with different hydrogeochemical settings.
4) The manuscript would benefit from acknowledging recent developments in quantum approaches for hydrology. I do strongly recommend citing the following relevant work, which provides useful methodological context: HydroQuantum: A new quantum-driven Python package for hydrological simulation.