the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Parameterization of the Subsolar Standoff Distance of Earth's Magnetopause based on Results from Machine Learning
Abstract. The subsolar standoff distance r0 of Earth's magnetopause is a key parameter in understanding the interaction between the solar wind and the magnetosphere. Despite decades of modeling efforts, significant uncertainties persist between model predictions and satellite observation of the magnetopause location. This study introduces a new data-driven parameterization of r0, based on a dataset containing over 220,000 dayside magnetopause crossings obtained by the THEMIS (2007–2022) and Cluster (2001–2020) missions. Each crossing is paired with high-resolution upstream solar wind parameters from the OMNI database. Four established empirical models are benchmarked against this dataset, yielding root-mean-square errors (RMSE) of ≳ 1 RE globally and ≳ 0.8 RE in the subsolar region. To determine the primary physical factors of r0, an XGBoost regression model is trained and interpreted using SHapley Additive exPlanation (SHAP) values. The solar wind dynamic pressure is found to be the dominant contributor, followed by geomagnetic indices (AE, SYMH), interplanetary magnetic field (IMF) magnitude, dipole tilt angle, and IMF cone angle. The IMF Bz component contributes only marginally when geomagnetic indices are included. A support vector regression (SVR) model using the six most influential parameters achieves a RMSE of 0.68 RE, improving on the best analytic model by approximately 17 %. A second-order polynomial expression with 14 terms is derived, providing a compact, interpretable, and accurate representation of r0. The SVR model and the polynomial representation is not able to predict r0 for extreme input conditions, e.g., during the passage of interplanetary coronal mass ejections. Accordingly, the parameter ranges that define the validity domain of the models are specified. The presented results offer improved predictive accuracy of the subsolar standoff distance and highlight the role of so far unconsidered parameters in modeling Earth's magnetopause.
- Preprint
(1240 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-4530', Anonymous Referee #1, 28 Oct 2025
-
AC1: 'Reply on RC1', Lars Klingenstein, 07 Nov 2025
We would like to express gratitude to anonymous referee no. 1 for reviewing the preprint and their contribution to quality control. Their comments are highly appreciated and of great value to improve the scientific value of the paper.
Line 24 – the references to models that explicitly use Bz as a parameter is incomplete, I suggest to add “for example” to the brackets with references.
Thank you for pointing that out, “e.g.” was added to the citations.
Line 42 – I suggest discarding the sentence about Sh98 model starting in this line and continue the text.
The sentence is removed for a better overall flow.
Line 125 – The formula uses only the Earth orbital motion, but the aberration depends on the perpendicular solar wind components. Its true that analysis in Safrankova et al. (2002) revealed that the application of propagated values of perpendicular components does not improve the prediction significantly and the authors argue that the main reason is probably the uncertainty in propagation of these component. However, Nemecek et al. (2020) have shown that there is a systematic deflection of the solar wind from the radial direction in the fast wind and application of this finding can further improve the prediction. This point would be discussed.
Thank you for this interesting comment. We added some sentences starting in line 126 and 129 discussing the remark. The paragraph reads as follows:
“Equation 4 is a simplification of the exact formula ψ=arctan(v⊥/vx), where v⊥=vy+vE. Since the vx component dominates the solar wind direction, vx≈v can be assumed safely. The vy component however is often not small compared to vE, especially not in the fast solar wind where the flow deflection has a median value of 18 km s-1 (Nemecek et al., 2020a). Including the vy component would increase the absolute aberration by almost two degrees which especially effects the position of the MPCs on the nightside. In the subsolar region, the aberration of the position is neglected since the effect is smaller and the absolute value of the magnetopause distance, which is unchanged by the aberration correction, is more important for our model than the exact position of the MPC. By rotating coordinate sensitive parameters, such as the position and the IMF, by ψ around the z-axis, the system is transformed into aberrated GSE (AGSE) coordinates. Again, the transformation is done for each MPC separately with its respective aberration angle. Derived quantities (like the clock and cone angles, the zenith angle, etc.) are then recomputed in AGSE format. A comparison of the AGSE IMF components derived with this method to those obtained when the aberration includes the vy component (see above) shows discrepancies of only a few tenths of a nanotesla. We argue that that the uncertainty in the IMF data is larger than this deviation and therefore use the simplified equation for the aberration. Including vy could be considered in future studies to account for the aforementioned effects more precisely.”Lines 254 and 255 – I would suggest to rephrase the sentence, because the quantities like n_estimator or learning_rate are specific for the software used and they are not necessarily clear for readers that are not familiar with ML techniques.
We have added an explanation that specifies what the hyperparameter names mean in a more general setting, not exclusive to the used software. The sentence reads:
“All but three hyperparameters of the XGB model are kept at their default values. The adjusted hyperparameters are the number of trees in the random forest (n_estimators = 1000), the depth, i.e. number of decisions, of each tree (max_depth = 6), and the contribution of each tree, which also influences how conservative the model is and how it generalizes (learning_rate = 0.05).”Line 329 – The authors probably have in mind “spatial coverage”
That is right, the typo is corrected.
References:
Němeček, Z; Ďurovcová, T; Šafránková, J; Richardson, JD; Šimůnek, J; Stevens, ML, (Non)radial Solar Wind Propagation through the Heliosphere, Astrophys. J. Lett., 897 (2): Art. No. L39, 2020.
Safrankova, J; Nemecek, Z; Dusik, S; Prech, L; Sibeck, DG; Borodkova, NN, The magnetopause shape and location: a comparison of the Interball and Geotail observations with models, Ann. Geophys., 20 (3): 301–309, 2002.Citation: https://doi.org/10.5194/egusphere-2025-4530-AC1 -
RC3: 'Reply on AC1', Anonymous Referee #1, 07 Nov 2025
I am fully satisfied with the corrections made by authors and I am happy that I can recommend the article for publication.
Citation: https://doi.org/10.5194/egusphere-2025-4530-RC3 -
AC3: 'Reply on RC3', Lars Klingenstein, 10 Nov 2025
Thank you very much for the recommendation and also once again for the valuable comments.
Citation: https://doi.org/10.5194/egusphere-2025-4530-AC3
-
AC3: 'Reply on RC3', Lars Klingenstein, 10 Nov 2025
-
RC3: 'Reply on AC1', Anonymous Referee #1, 07 Nov 2025
-
AC1: 'Reply on RC1', Lars Klingenstein, 07 Nov 2025
-
RC2: 'Comment on egusphere-2025-4530', Anonymous Referee #2, 28 Oct 2025
The manuscript presents a valuable contribution to the study of the Earth’s magnetopause by developing a new data-driven parameterization of the subsolar standoff distance. Using an extensive dataset of over 220,000 magnetopause crossings from THEMIS (2007–2022) and Cluster (2001–2020), the authors benchmark several established empirical models and then apply modern machine learning methods (XGBoost, SVR, and SHAP interpretation) to identify key controlling parameters and develop improved predictive models. Interestingly, they find that geomagnetic indices (SYM-H and AE) are more influential than IMF magnitude and Bz when included jointly, highlighting that geomagnetic activity may encapsulate some of the physical effects traditionally attributed to the IMF. The final SVR and second-order polynomial models both show improved accuracy compared to existing parameterizations, and the authors carefully discuss their limitations under extreme conditions.
Strengths: The manuscript addresses an important and long-standing issue in magnetospheric physics: producing an accurate yet easily implementable model of the magnetopause. The authors make sound methodological choices, applying established techniques for magnetopause classification and subsolar data reduction. The use of SHAP values provides a clear and interpretable assessment of variable importance, strengthening confidence in the physical consistency of the results. The paper is clearly written, logically structured, and effectively contextualizes the results within the broader literature. The inclusion of explicit validity domains and discussion of limitations (e.g., performance under CME conditions and challenges in cusp regions) demonstrates scientific rigor and transparency.I recommend the manuscript for publication with a suggested minor revision:
The manuscript refers to a second-order polynomial representation of the subsolar standoff distance but does not provide its explicit formula. I think that including a mathematical expression for the second-order polynomial model would greatly enhance accessibility and allow other researchers to readily implement the model.
Citation: https://doi.org/10.5194/egusphere-2025-4530-RC2 -
AC2: 'Reply on RC2', Lars Klingenstein, 07 Nov 2025
We would like to express gratitude to anonymous referee no. 2 for reviewing the preprint and their contribution to quality control. Their comments are highly appreciated and of great value to improve the scientific value of the paper.
The manuscript refers to a second-order polynomial representation of the subsolar standoff distance but does not provide its explicit formula. I think that including a mathematical expression for the second-order polynomial model would greatly enhance accessibility and allow other researchers to readily implement the model.
Thank you for the comment, we included the formula for the non-reduced polynomial model as equation (B1). r0 is given in units of RE, parameters are without units since they are expected to be scaled as explained in the text before they are used in the formula. Table B1 has been removed since the added written out formula provides the same information and the table would be redundant. Appendix B has been renamed as well since it now contains the model equation.
Citation: https://doi.org/10.5194/egusphere-2025-4530-AC2
-
AC2: 'Reply on RC2', Lars Klingenstein, 07 Nov 2025
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 741 | 63 | 13 | 817 | 17 | 20 |
- HTML: 741
- PDF: 63
- XML: 13
- Total: 817
- BibTeX: 17
- EndNote: 20
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Referee report on the paper
Parameterization of the Subsolar Standoff Distance of Earth’s Magnetopause based on Results from Machine Learning
by Klingenstein et al.
The manuscript deals with a critical review of present magnetopause model and suggests a new method for prediction of the subsolar magnetopause position that is based on machine learning approach. The comparison of the machine learning results with several previous models shows a more precise prediction. The most surprising result of the present analysis is that out of ecliptic IMF component plays a minor role in prediction of the magnetopause location, its effect is hidden in dependence on geomagnetic indices and other parameters that correlate with it.
The manuscript is written in good English, its organization is appropriate and thus I have only a few minor comments:
Line 24 – the references to models that explicitly use Bz as a parameter is incomplete, I suggest to add “for example” to the brackets with references.
Line 42 – I suggest discarding the sentence about Sh98 model starting in this line and continue the text.
Line 125 – The formula uses only the Earth orbital motion, but the aberration depends on the perpendicular solar wind components. Its true that analysis in Safrankova et al. (2002) revealed that the application of propagated values of perpendicular components does not improve the prediction significantly and the authors argue that the main reason is probably the uncertainty in propagation of these component. However, Nemecek et al. (2020) have shown that there is a systematic deflection of the solar wind from the radial direction in the fast wind and application of this finding can further improve the prediction. This point would be discussed.
Lines 254 and 255 – I would suggest to rephrase the sentence, because the quantities like n_estimator or learning_rate are specific for the software used and they are not necessarily clear for readers that are not familiar with ML techniques.
Line 329 – The authors probably have in mind “spatial coverage”
References:
Němeček, Z; Ďurovcová, T; Šafránková, J; Richardson, JD; Šimůnek, J; Stevens, ML, (Non)radial Solar Wind Propagation through the Heliosphere, Astrophys. J. Lett., 897 (2): Art. No. L39, 2020.
Safrankova, J; Nemecek, Z; Dusik, S; Prech, L; Sibeck, DG; Borodkova, NN, The magnetopause shape and location: a comparison of the Interball and Geotail observations with models, Ann. Geophys., 20 (3): 301–309, 2002.