A ground motion prediction model for the Italian region based on a mixture of experts framework
Abstract. Earthquake ground-motion prediction is crucial for seismic design, seismic hazard assessment, and the resilience of urban infrastructure. Although extensive research has been conducted for Italy, existing models cover only a limited range of earthquake types, exhibit insufficient accuracy and uncertainty control under complex scenarios – thus lowering reliability – and provide a restricted set of ground-motion intensity measures (IMs) that cannot meet the multi-indicator needs of engineering practice and risk assessment. To address these issues, this study proposes a ground-motion prediction model for Italy based on a Mixture of experts (MOE) framework, in which XGBoost is employed as the expert submodels to enhance predictive accuracy and stability across diverse scenarios. We conduct a systematic comparison between the proposed MOE-XGB and a baseline Gaussian process regression model with an exponential kernel (GPR, exponential). The results show stable and balanced improvements across multiple IMs – such as peak ground acceleration (PGA), peak ground velocity (PGV), and spectral acceleration (SA) at different periods – demonstrating advantages in both accuracy and robustness. Furthermore, using the larger and more diverse ITACA (Italian Accelerometric Archive) dataset, we retrain and evaluate MOE-XGB. The model achieves higher accuracy on all considered metrics and maintains stable performance in generalization tests based on independent earthquake events, highlighting strong generalization capability and robustness. In summary, the proposed MOE-XGB provides a high-accuracy and broadly applicable solution for ground-motion prediction in Italy; meanwhile, the framework exhibits good transferability and scalability, offering a useful reference for fusion-model-driven ground-motion prediction in Europe and other regions.
General comments
The manuscript presents a machine-learning ground-motion prediction model for Italy based on a Mixture-of-Experts (MOE) framework combined with XGBoost regressors. The topic is timely and potentially relevant for the ground-motion and natural hazards communities, particularly in light of the growing interest in data-driven and hybrid approaches for seismic ground-motion prediction.
The manuscript is clearly written, technically detailed, and supported by an extensive set of figures and statistical analyses. The use of large Italian strong-motion datasets and the attempt to systematically compare the proposed framework with both empirical and machine-learning reference models are appreciated.
However, despite these positive aspects, the current version of the manuscript presents substantial methodological and interpretative shortcomings that prevent a reliable assessment of the proposed model’s actual predictive capability and scientific added value. In particular, the validation strategy, the interpretation of residual variability, the definition and documentation of the datasets and predictors, and the level of physical interpretability of the results require major revision.
For these reasons, I recommend reconsideration after major revisions. The issues raised below are structural rather than cosmetic, but in my view they can be addressed with a careful redesign of parts of the analysis and a more cautious interpretation of the results.
Major comments
The most critical issue concerns the validation strategy adopted throughout the manuscript. The authors rely on random splits of the dataset into training, validation, and test subsets (70/15/15). In ground-motion modeling, this approach is not appropriate, as recordings from the same earthquake are inherently correlated through shared source, path, and rupture characteristics.
With random splitting, records from the same seismic event can appear simultaneously in both training and test sets, leading to event-level information leakage. This typically results in overly optimistic performance metrics that reflect within-event learning rather than true generalization to unseen earthquakes.
Although the manuscript repeatedly refers to “independent testing” and “generalization to independent earthquake events” (including in the abstract), no strict event-wise validation strategy (e.g., leave-one-event-out, leave-multiple-events-out, or grouped cross-validation by event) is explicitly described or demonstrated.
Action required:
The authors must clearly document the validation strategy and, if not already done, redesign the evaluation using an explicit event-wise split. All performance metrics (RMSE, correlation, residual distributions, intra- and inter-event residuals) should be recomputed under this framework. Without event-wise independence, the reported performance improvements cannot be considered reliable.
The manuscript emphasizes a strong reduction in residual dispersion (σ) and improved residual concentration compared to reference models. However, the treatment of aleatory variability remains largely descriptive and is not sufficiently grounded in established GMPE practice.
In flexible machine-learning models, reductions in residual variance may arise from overfitting, implicit smoothing, or exploitation of event-specific patterns rather than from a genuinely improved representation of source, path, and site effects. The manuscript does not assess whether the inferred variability levels are physically plausible, nor does it benchmark them against reference σ, τ, and ϕ values from established Italian or pan-European GMPEs.
Action required:
The authors should (i) clarify how residual components are estimated in the presence of complex ML architectures, (ii) compare the resulting variability levels with published GMPE variability models, and (iii) discuss whether the observed reductions are physically meaningful or may reflect methodological artifacts.
The manuscript does not clearly and consistently document the full set of predictors used in the MOE-XGB model. While some variables are mentioned (e.g., magnitude, distance, Vs30, elevation, coordinates), a complete and unambiguous list of input features is missing, as is a discussion of their relative roles.
Action required:
The authors should provide a clear table listing all predictors used, their definitions, data sources, and preprocessing steps. This is essential for reproducibility and for assessing the physical consistency of the model.
Given the complexity of the proposed MOE-XGB framework and the strong claims regarding predictive performance, the manuscript lacks a systematic interpretability analysis. No global or local assessment of variable importance is provided, and the role of individual predictors in controlling ground-motion behavior remains unclear.
Action required:
The authors should include an explicit interpretability analysis (e.g., SHAP-based global importance such as beeswarm plots, partial dependence or accumulated local effects plots) to clarify which predictors dominate the predictions across different intensity measures and periods. This would substantially improve the scientific value of the work and help distinguish physically meaningful patterns from purely data-driven behavior.
The manuscript refers to the ITACA dataset but does not clearly specify which version of the ITACAext flatfiles is used. Recent releases (e.g., ITACAext flatfile 2.0; Lanzano et al., 2024) provide updated metadata and intensity measures.
Action required:
The authors should explicitly state the exact dataset version used and justify their choice if a non-latest release is adopted. Clear and precise data citation is essential for transparency and reproducibility.
The manuscript repeatedly suggests applicability to seismic hazard assessment and PSHA. However, no hazard-oriented application is demonstrated, and key aspects such as spatial correlation, rupture geometry, or integration into PSHA workflows are not addressed.
Action required:
The authors should substantially tone down claims related to seismic hazard applications or explicitly demonstrate how the proposed model could be integrated into PSHA frameworks. At present, such claims appear overstated relative to the analyses performed.
Minor comments and technical issues
Recommendation
In its current form, the manuscript does not meet the methodological standards required for publication. However, I believe that, with substantial revision, the study could become a meaningful contribution. I therefore recommend reconsideration after major revisions, addressing in particular the validation strategy, interpretability, dataset definition, and the physical interpretation of residual variability.