the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A Multi-chain Surrogate-assisted Hybrid Optimization Framework for Joint Identification of Groundwater Contaminant Sources and Hydrogeological Parameters
Abstract. Rapid and accurate identification of groundwater contaminant information and hydrogeological parameters is crucial for effective groundwater remediation and risk management. Within a simulation-optimization framework, this task is inherently posed as a mixed-variable optimization problem involving discrete parameters (e.g., source locations) and continuous ones (e.g., hydraulic heads, conductivities, and release fluxes). However, several challenges arise in this context. First, conventional optimization algorithms often exhibit slow convergence and unstable performance. Second, they typically require thousands of simulations to adequately explore the complex parameter space, resulting in prohibitive computational costs. To address these issues, this study develops a surrogate-assisted hybrid algorithm that integrates the Cooperative Search Algorithm (CSA) and Tabu Search (TS) within a synergistic multi-chain optimization framework, termed SA-CSA-TS. In each iteration, individual chains first perform independent CSA-based optimization to promote broad global exploration, after which they collaboratively refine source locations through a neighbourhood search guided by a shared tabu list. In addition, surrogate models equipped with a reconstruction strategy partially replace groundwater simulations, thereby substantially reducing the computational burden. Case studies reveal that the Radial Basis Function (RBF) outperforms other mainstream surrogate models in both accuracy and stability. Furthermore, comparative experiments confirm that the proposed SA-CSA-TS framework not only achieves higher solution accuracy but also significantly reduces computational demand, demonstrating strong potential for efficient groundwater contamination diagnosis.
- Preprint
(2915 KB) - Metadata XML
-
Supplement
(202 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
CC1: 'Comment on egusphere-2025-6140', Giacomo Medici, 19 Dec 2025
-
AC1: 'Reply on CC1', Qingyun Duan, 20 Mar 2026
We appreciate the reviewer's positive evaluation of our work and the valuable suggestions provided. All the comments are constructive and
will help improve the clarity and presentation of the manuscript. A detailed point-by-point response is provided in the attached supplement.
-
AC1: 'Reply on CC1', Qingyun Duan, 20 Mar 2026
-
CC2: 'Comment on egusphere-2025-6140', Nima Zafarmomen, 25 Dec 2025
The study introduces a significant advancement in the simulation-optimization (S−O) framework for groundwater contamination diagnosis. The novelty of the SA-CSA-TS framework lies in its synergistic multi-chain architecture. Unlike traditional single-population algorithms, the integration of the Cooperative Search Algorithm (CSA) for global exploration and Tabu Search (TS) for local refinement—guided by a shared tabu list—effectively addresses the "equifinality" and multimodality inherent in mixed-variable groundwater problems.
Furthermore, the systematic evaluation of surrogate models and the implementation of a dynamic reconstruction strategy (achieving an 85–88% reduction in computational demand) provides a highly practical blueprint for real-world remediation efforts where time and computing resources are limited.
Minor Comments:
1. In Section 3.4, the authors describe the update rules for the tabu list. It would be beneficial to briefly clarify the "tenure" or size of the tabu list. Does the list have a maximum capacity, or does it grow indefinitely throughout the FEmaxiterations?
2. While the authors mention using default settings in UQPyL, the performance of Kriging and Gaussian Processes is often highly sensitive to the choice of kernel/correlation functions. A brief sentence justifying the choice of the Cubic RBF kernel over others (like Thin Plate Spline) would add more depth to the surrogate comparison section.
3. In the discussion of the "parameter-compensation effect" (Section 7.2), the authors correctly identify that multiple locations can yield similar concentrations. It might be helpful to suggest how monitoring well placement (optimal experimental design) could potentially reduce this equifinality in future iterations of the framework.
4. In Figure 16 and Figure 18 (Radar Charts), the overlap of the GA and CSA lines can be difficult to distinguish. Consider using slightly different line textures (e.g., dashed vs. dotted) to improve accessibility for the reader.
5. To broaden the impact of the study, the authors should consider how this framework interacts with broader hydrological cycles and diverse data sources. I strongly recommend the authors consider and potentially reference studies such as: "Assimilation of sentinel‐based leaf area index for modeling surface‐ground water interactions in irrigation districts" This would help contextualize how satellite-derived data and surface-water interactions might provide additional constraints to the groundwater simulation models, potentially refining the identification of hydrogeological parameters.
Citation: https://doi.org/10.5194/egusphere-2025-6140-CC2 -
AC2: 'Reply on CC2', Qingyun Duan, 28 Mar 2026
We sincerely thank the reviewer for these thoughtful and constructive comments. We agree that these suggestions are valuable and will help strengthen the manuscript.
For Comment 1, we will clarify that the maximum capacity of the tabu list is set to be the same as the number of potential contamination source locations. In addition, we have defined explicit update rules for the tabu list, as described in Lines 270–275.
For Comment 2, we will conduct additional numerical experiments. A detailed response will be provided in the subsequent submission.
For Comments 3–5, we thank the reviewer for these constructive comments. We will take them into careful consideration and incorporate corresponding revisions in the revised manuscript.
Citation: https://doi.org/10.5194/egusphere-2025-6140-AC2
-
AC2: 'Reply on CC2', Qingyun Duan, 28 Mar 2026
-
RC1: 'Comment on egusphere-2025-6140', Anonymous Referee #1, 25 Mar 2026
A new efficient calculation procedure was developed for estimating the input parameters for groundwater model simulation to explore the locations of contamination sources. The authors demonstrated the step-by-step performance evaluation of the procedure through hypothetical field conditions to near real-world situations. Although the manuscript contains several points of uncertainty or illegibility, as noted below, the overall structure makes it possible to understand the authors’ great work as described above.
・ In Subchapter 4.3, the authors described Case 3 as a real-world groundwater problem; however, insufficient information was provided for the regional groundwater flow system. At a minimum, “aquifer structure (number of layers),” “number of grid cells in the depth direction,” “water input to the aquifer (i.e., presence or absence of local precipitation),” and “reasonable information to define the regions as four; for example, geological information except the northern and southern boundary areas as mentioned" would make it easier to agree with the process by which the model was discretized from the actual groundwater flow field to the structure necessary to achieve the objectives.
・The title in Chapter 3 appears irregular with the other chapter titles because it includes the specific names of algorithms (SA-CSA-TS). Abstracting the title, for example, to “Overview of the Developed Algorithms, ”would make the overall context of the article easier to understand.
・The manuscript contained many figures, which reduced readability. It would be more readable to consolidate Figures 14, 15, and 17, which are similar; a multi-panel figure with branch numbers 14a, 14b, and 14c can be used. This approach could also be applied to pairs of figures that are “of the same type but depict different experimental cases,” such as Figures 12 and 13 or Figures 10 and 11. If possible, the authors should consider categorizing the figures into those that are strongly relevant to the derivation of conclusions and those that are not; the latter should be moved to the supplement.
・Limitations of this approach at the present time must be addressed. The estimation of three-dimensional flow and dispersion of contamination, which is one of the major purposes of groundwater flow modeling, is not captured by this approach.
・ In this manuscript, a model with a simplified depth layer is constructed. If the layer (as grids) is increased for depth, would the computation time remain within a practical range? I believe that the importance of this question is related to the applicability of the developed method beyond the estimation of contamination locations under the simplified aquifer system (mentioned in the last line of conclusion).
Citation: https://doi.org/10.5194/egusphere-2025-6140-RC1 -
AC3: 'Reply on RC1', Qingyun Duan, 28 Mar 2026
We sincerely thank the reviewer for these thoughtful and constructive comments. We agree that these suggestions are highly valuable for improving the manuscript, particularly in terms of model description, presentation, readability, and discussion of the method’s limitations and applicability. We will carefully consider all of these comments and incorporate corresponding revisions in the revised manuscript. Detailed point-by-point responses will be provided in the subsequent submission.
Citation: https://doi.org/10.5194/egusphere-2025-6140-AC3
-
AC3: 'Reply on RC1', Qingyun Duan, 28 Mar 2026
-
RC2: 'Comment on egusphere-2025-6140', Wei Gong, 06 Apr 2026
This paper proposed a new synergistic method called SA-CSA-TS based on (1) multi-chain (2) surrogate-assisted (3) hybrid optimization framework integrating the Cooperative Search Algorithm (CSA) and Tabu Search (TS) for Groundwater Contamination Source Identification (GCSI) problem. Three cases with synthetic and practical data were carried out, demonstrating that the proposed SA-CSA-TS method can consistently identify both the values of hydrogeological parameters and the locations and time-varying release fluxes of contaminant sources. The novel contributions of this paper are significant enough to be published on HESS. Only a few minor revisions are required.
- Section 2.2.2, Page 5. Why was CSA chosen instead of other heuristic optimization algorithms, such as the well known Genetic Algorithm, Simulated Annealing, etc.? The reasons for choosing CSA need to be explained. After reading [Feng et.al., 2021], I got that CSA is a new optimization algorithm with outstanding performance, but it’s not well known in the hydrology community. Please provide more information about CSA.
- Section 2.3, Page 6 - 7. This paper established a surrogate-assisted optimization method for GCSI problem, which need to construct multiple surrogate models for each well. The computational burden of surrogate model might be too heavy, if the surrogate model itself is too expensive. Consequently, the surrogate model selected in this method should be cheap and effective. The reason of selecting RBF has been sufficiently demonstrated in the discussion part in section 5.2, line 385-390, page 17. But the introduction of the compared surrogate models in section 2.3 is somehow insufficient and misleading.
- The relationship between the Gaussian Processes Regression (GPR) and Kriging interpolation should be explained. The Gaussian Processes Regression originated from machine learning, and Kriging interpolation originated from geostatistics. The two are essentially the same method in mathematics, and the main difference between them is that they have different expressions of the same concept, because they grew up in difference community. It is necessary to introduce what kind of covariance function is used in the Gaussian Processes Regression here and how the hyper-parameters are set. If the mathematical equations are the same with Kriging, only one should be kept.
- Radial Basis Functioncan be used as a covariance function in Gaussian Processes Regression (GPR), and a kernel in Support Vector Regression (SVR). Please list out the setup details of GPR and SVR (e.g. the parameters assigned in scikit-learn function call. Please double check the details if the default parameters were used.), in order to demonstrated what were actually compared in this work.
- GPR(merge with Kriging), SVR and RBF methods has many hyper-parameters, which may have significant influence to the final fitting performance. The hyper-parameters in GPR (mainly in the covariance function) can be manually specified, or automatically optimized, as elaborated in [Rasmussen and Williams, 2006]. The hyper-parameters of SVR (use RBF kernel) C, gamma, epsilon, are usually optimized with a crude grid-search, or manually specified according to expert knowledge. Please double check the hyper-parameters used in the comparison, are they automatically optimized? Or had been specified with default values? The fitting RMSE varies greatly, possibly because the default hyper-parameters are used, which do not appropriate for the problem in this paper. A good practice is automatically optimize, or manually tune the hyper-parameters after initial sampling and first construction of surrogate models.
- Section 3, page 8. This section is also methodology section, and this is actually the novel contribution of this paper. Some content in this section overlaps with section 2. It is better to move this section as section 2, in order to highlight the research significance. Move the introduction of surrogate models, CSA and TS, as sub-sections within this section.
- Table 3, page 16. Add RMSE values in this table.
- These two references are the same book? Please double check.
Rasmussen, C.E., Williams, C.K.I., 2006a. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, USA.
Rasmussen, C.E., Williams, C.K.I., 2006b. Gaussian processes for machine learning, Adaptive computation and machine learning. MIT Press, Cambridge, Mass.
Citation: https://doi.org/10.5194/egusphere-2025-6140-RC2 -
AC4: 'Reply on RC2', Qingyun Duan, 07 Apr 2026
We appreciate the reviewer for the careful reading of our manuscript and the insightful comments. We are also grateful for the positive assessment of the novelty and significance of the proposed SA-CSA-TS method. We fully agree with the suggestions raised in this manuscript, which are very helpful for enhancing the clarity and rigor of the manuscript, especially regarding the justification of algorithm selection, the detailed description and fair comparison of surrogate models, the overall structure of the methodology, and the completeness of the presented results. All of these comments will be thoroughly addressed and reflected in the revised manuscript. A point-by-point response will be provided along with the revised version.
Citation: https://doi.org/10.5194/egusphere-2025-6140-AC4
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 347 | 187 | 48 | 582 | 42 | 86 | 89 |
- HTML: 347
- PDF: 187
- XML: 48
- Total: 582
- Supplement: 42
- BibTeX: 86
- EndNote: 89
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
General comments
Good research that needs some improvement. See my specific comments that should improve the manuscript.
Specific comments
Lines 34-35. “Groundwater contamination has become an increasingly critical issue, posing significant risks to environmental safety and public health”. Insert recent literature on groundwater contamination with an evident worldwide angle:
- Agbotui, P. Y., Firouzbehi, F., Medici, G. 2025. Review of effective porosity in sandstone aquifers: insights for representation of contaminant transport. Sustainability, 17(14), 6469.
- Sauvé, S.,Desrosiers, M. 2014. A review of what is an emerging contaminant. Chemistry Central Journal, 8(1), 15.
Line 91. You need to disclose the general aim of the research.
Line 91. You need to describe the specific objectives of your research by using numbers (e.g., i, ii, and iii).
Line 92-onwards. You need to add more information on the boundary conditions.
Line 92-onwards. Add more detail on the nature of the geological material modelled.
Line 109. Overall, 9 equations in the manuscript are too many, not all of the are necessary. Equation 2 is very well known.
Line 175. Equations on kriging (very well-known method) not necessary.
Line 515. Assign a number to this equation.
Figures and tables
Figure 1-5. Room to make the figures larger.
Figure 8. You need to discuss boundary conditions in more detail in the main body.
Figure 8. Increase the graphic resolution of the figure.