Parameterization of the Subsolar Standoff Distance of Earth's Magnetopause based on Results from Machine Learning

Klingenstein, Lars; Grimmich, Niklas; Shprits, Yuri; Pöppelwerth, Adrian; Plaschke, Ferdinand

doi:10.5194/egusphere-2025-4530

Preprints

https://doi.org/10.5194/egusphere-2025-4530

Preprints

25 Sep 2025

| 25 Sep 2025

Parameterization of the Subsolar Standoff Distance of Earth's Magnetopause based on Results from Machine Learning

Lars Klingenstein, Niklas Grimmich, Yuri Shprits, Adrian Pöppelwerth, and Ferdinand Plaschke

Abstract. The subsolar standoff distance r₀ of Earth's magnetopause is a key parameter in understanding the interaction between the solar wind and the magnetosphere. Despite decades of modeling efforts, significant uncertainties persist between model predictions and satellite observation of the magnetopause location. This study introduces a new data-driven parameterization of r₀, based on a dataset containing over 220,000 dayside magnetopause crossings obtained by the THEMIS (2007–2022) and Cluster (2001–2020) missions. Each crossing is paired with high-resolution upstream solar wind parameters from the OMNI database. Four established empirical models are benchmarked against this dataset, yielding root-mean-square errors (RMSE) of ≳ 1 R_E globally and ≳ 0.8 R_E in the subsolar region. To determine the primary physical factors of r₀, an XGBoost regression model is trained and interpreted using SHapley Additive exPlanation (SHAP) values. The solar wind dynamic pressure is found to be the dominant contributor, followed by geomagnetic indices (AE, SYMH), interplanetary magnetic field (IMF) magnitude, dipole tilt angle, and IMF cone angle. The IMF B_z component contributes only marginally when geomagnetic indices are included. A support vector regression (SVR) model using the six most influential parameters achieves a RMSE of 0.68 R_E, improving on the best analytic model by approximately 17 %. A second-order polynomial expression with 14 terms is derived, providing a compact, interpretable, and accurate representation of r₀_. The SVR model and the polynomial representation is not able to predict r₀ for extreme input conditions, e.g., during the passage of interplanetary coronal mass ejections. Accordingly, the parameter ranges that define the validity domain of the models are specified. The presented results offer improved predictive accuracy of the subsolar standoff distance and highlight the role of so far unconsidered parameters in modeling Earth's magnetopause.

Received: 15 Sep 2025 – Discussion started: 25 Sep 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 1240 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (1240 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

15 Dec 2025

Parameterization of the subsolar standoff distance of Earth's magnetopause based on results from machine learning

Lars Klingenstein, Niklas Grimmich, Yuri Y. Shprits, Adrian Pöppelwerth, and Ferdinand Plaschke

Ann. Geophys., 43, 835–854, https://doi.org/10.5194/angeo-43-835-2025,https://doi.org/10.5194/angeo-43-835-2025, 2025

Short summary

Lars Klingenstein, Niklas Grimmich, Yuri Shprits, Adrian Pöppelwerth, and Ferdinand Plaschke

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-4530', Anonymous Referee #1, 28 Oct 2025

Referee report on the paper
Parameterization of the Subsolar Standoff Distance of Earth’s Magnetopause based on Results from Machine Learning
by Klingenstein et al.
The manuscript deals with a critical review of present magnetopause model and suggests a new method for prediction of the subsolar magnetopause position that is based on machine learning approach. The comparison of the machine learning results with several previous models shows a more precise prediction. The most surprising result of the present analysis is that out of ecliptic IMF component plays a minor role in prediction of the magnetopause location, its effect is hidden in dependence on geomagnetic indices and other parameters that correlate with it.
The manuscript is written in good English, its organization is appropriate and thus I have only a few minor comments:
Line 24 – the references to models that explicitly use Bz as a parameter is incomplete, I suggest to add “for example” to the brackets with references.
Line 42 – I suggest discarding the sentence about Sh98 model starting in this line and continue the text.
Line 125 – The formula uses only the Earth orbital motion, but the aberration depends on the perpendicular solar wind components. Its true that analysis in Safrankova et al. (2002) revealed that the application of propagated values of perpendicular components does not improve the prediction significantly and the authors argue that the main reason is probably the uncertainty in propagation of these component. However, Nemecek et al. (2020) have shown that there is a systematic deflection of the solar wind from the radial direction in the fast wind and application of this finding can further improve the prediction. This point would be discussed.
Lines 254 and 255 – I would suggest to rephrase the sentence, because the quantities like n_estimator or learning_rate are specific for the software used and they are not necessarily clear for readers that are not familiar with ML techniques.
Line 329 – The authors probably have in mind “spatial coverage”
References:
Němeček, Z; Ďurovcová, T; Šafránková, J; Richardson, JD; Šimůnek, J; Stevens, ML, (Non)radial Solar Wind Propagation through the Heliosphere, Astrophys. J. Lett., 897 (2): Art. No. L39, 2020.
Safrankova, J; Nemecek, Z; Dusik, S; Prech, L; Sibeck, DG; Borodkova, NN, The magnetopause shape and location: a comparison of the Interball and Geotail observations with models, Ann. Geophys., 20 (3): 301–309, 2002.

Citation: https://doi.org/10.5194/egusphere-2025-4530-RC1
- AC1:
  'Reply on RC1', Lars Klingenstein, 07 Nov 2025
  
  We would like to express gratitude to anonymous referee no. 1 for reviewing the preprint and their contribution to quality control. Their comments are highly appreciated and of great value to improve the scientific value of the paper.
  Line 24 – the references to models that explicitly use Bz as a parameter is incomplete, I suggest to add “for example” to the brackets with references.
  Thank you for pointing that out, “e.g.” was added to the citations.
  
  Line 42 – I suggest discarding the sentence about Sh98 model starting in this line and continue the text.
  The sentence is removed for a better overall flow.
  
  Line 125 – The formula uses only the Earth orbital motion, but the aberration depends on the perpendicular solar wind components. Its true that analysis in Safrankova et al. (2002) revealed that the application of propagated values of perpendicular components does not improve the prediction significantly and the authors argue that the main reason is probably the uncertainty in propagation of these component. However, Nemecek et al. (2020) have shown that there is a systematic deflection of the solar wind from the radial direction in the fast wind and application of this finding can further improve the prediction. This point would be discussed.
  Thank you for this interesting comment. We added some sentences starting in line 126 and 129 discussing the remark. The paragraph reads as follows:
  
  “Equation 4 is a simplification of the exact formula ψ=arctan(v_⊥/v_x), where v_⊥=v_y+v_E. Since the v_x component dominates the solar wind direction, v_x≈v can be assumed safely. The v_y component however is often not small compared to v_E, especially not in the fast solar wind where the flow deflection has a median value of 18 km s^-1 (Nemecek et al., 2020a). Including the v_y component would increase the absolute aberration by almost two degrees which especially effects the position of the MPCs on the nightside. In the subsolar region, the aberration of the position is neglected since the effect is smaller and the absolute value of the magnetopause distance, which is unchanged by the aberration correction, is more important for our model than the exact position of the MPC. By rotating coordinate sensitive parameters, such as the position and the IMF, by ψ around the z-axis, the system is transformed into aberrated GSE (AGSE) coordinates. Again, the transformation is done for each MPC separately with its respective aberration angle. Derived quantities (like the clock and cone angles, the zenith angle, etc.) are then recomputed in AGSE format. A comparison of the AGSE IMF components derived with this method to those obtained when the aberration includes the v_y component (see above) shows discrepancies of only a few tenths of a nanotesla. We argue that that the uncertainty in the IMF data is larger than this deviation and therefore use the simplified equation for the aberration. Including v_y could be considered in future studies to account for the aforementioned effects more precisely.”
  
  Lines 254 and 255 – I would suggest to rephrase the sentence, because the quantities like n_estimator or learning_rate are specific for the software used and they are not necessarily clear for readers that are not familiar with ML techniques.
  We have added an explanation that specifies what the hyperparameter names mean in a more general setting, not exclusive to the used software. The sentence reads:
  
  “All but three hyperparameters of the XGB model are kept at their default values. The adjusted hyperparameters are the number of trees in the random forest (n_estimators = 1000), the depth, i.e. number of decisions, of each tree (max_depth = 6), and the contribution of each tree, which also influences how conservative the model is and how it generalizes (learning_rate = 0.05).”
  
  Line 329 – The authors probably have in mind “spatial coverage”
  That is right, the typo is corrected.
  
  References:
  
  Němeček, Z; Ďurovcová, T; Šafránková, J; Richardson, JD; Šimůnek, J; Stevens, ML, (Non)radial Solar Wind Propagation through the Heliosphere, Astrophys. J. Lett., 897 (2): Art. No. L39, 2020.
  
  Safrankova, J; Nemecek, Z; Dusik, S; Prech, L; Sibeck, DG; Borodkova, NN, The magnetopause shape and location: a comparison of the Interball and Geotail observations with models, Ann. Geophys., 20 (3): 301–309, 2002.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4530-AC1
  - RC3: 'Reply on AC1', Anonymous Referee #1, 07 Nov 2025
    
    I am fully satisfied with the corrections made by authors and I am happy that I can recommend the article for publication.
    
    Citation: https://doi.org/10.5194/egusphere-2025-4530-RC3
    
    AC3: 'Reply on RC3', Lars Klingenstein, 10 Nov 2025
    
    Thank you very much for the recommendation and also once again for the valuable comments.
    
    Citation: https://doi.org/10.5194/egusphere-2025-4530-AC3
RC2:
'Comment on egusphere-2025-4530', Anonymous Referee #2, 28 Oct 2025

The manuscript presents a valuable contribution to the study of the Earth’s magnetopause by developing a new data-driven parameterization of the subsolar standoff distance. Using an extensive dataset of over 220,000 magnetopause crossings from THEMIS (2007–2022) and Cluster (2001–2020), the authors benchmark several established empirical models and then apply modern machine learning methods (XGBoost, SVR, and SHAP interpretation) to identify key controlling parameters and develop improved predictive models. Interestingly, they find that geomagnetic indices (SYM-H and AE) are more influential than IMF magnitude and Bz when included jointly, highlighting that geomagnetic activity may encapsulate some of the physical effects traditionally attributed to the IMF. The final SVR and second-order polynomial models both show improved accuracy compared to existing parameterizations, and the authors carefully discuss their limitations under extreme conditions.

Strengths: The manuscript addresses an important and long-standing issue in magnetospheric physics: producing an accurate yet easily implementable model of the magnetopause. The authors make sound methodological choices, applying established techniques for magnetopause classification and subsolar data reduction. The use of SHAP values provides a clear and interpretable assessment of variable importance, strengthening confidence in the physical consistency of the results. The paper is clearly written, logically structured, and effectively contextualizes the results within the broader literature. The inclusion of explicit validity domains and discussion of limitations (e.g., performance under CME conditions and challenges in cusp regions) demonstrates scientific rigor and transparency.
I recommend the manuscript for publication with a suggested minor revision:
The manuscript refers to a second-order polynomial representation of the subsolar standoff distance but does not provide its explicit formula. I think that including a mathematical expression for the second-order polynomial model would greatly enhance accessibility and allow other researchers to readily implement the model.

Citation: https://doi.org/10.5194/egusphere-2025-4530-RC2
- AC2: 'Reply on RC2', Lars Klingenstein, 07 Nov 2025
  
  We would like to express gratitude to anonymous referee no. 2 for reviewing the preprint and their contribution to quality control. Their comments are highly appreciated and of great value to improve the scientific value of the paper.
  
  The manuscript refers to a second-order polynomial representation of the subsolar standoff distance but does not provide its explicit formula. I think that including a mathematical expression for the second-order polynomial model would greatly enhance accessibility and allow other researchers to readily implement the model.
  Thank you for the comment, we included the formula for the non-reduced polynomial model as equation (B1). r₀ is given in units of R_E, parameters are without units since they are expected to be scaled as explained in the text before they are used in the formula. Table B1 has been removed since the added written out formula provides the same information and the table would be redundant. Appendix B has been renamed as well since it now contains the model equation.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4530-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-4530', Anonymous Referee #1, 28 Oct 2025

Referee report on the paper
Parameterization of the Subsolar Standoff Distance of Earth’s Magnetopause based on Results from Machine Learning
by Klingenstein et al.
The manuscript deals with a critical review of present magnetopause model and suggests a new method for prediction of the subsolar magnetopause position that is based on machine learning approach. The comparison of the machine learning results with several previous models shows a more precise prediction. The most surprising result of the present analysis is that out of ecliptic IMF component plays a minor role in prediction of the magnetopause location, its effect is hidden in dependence on geomagnetic indices and other parameters that correlate with it.
The manuscript is written in good English, its organization is appropriate and thus I have only a few minor comments:
Line 24 – the references to models that explicitly use Bz as a parameter is incomplete, I suggest to add “for example” to the brackets with references.
Line 42 – I suggest discarding the sentence about Sh98 model starting in this line and continue the text.
Line 125 – The formula uses only the Earth orbital motion, but the aberration depends on the perpendicular solar wind components. Its true that analysis in Safrankova et al. (2002) revealed that the application of propagated values of perpendicular components does not improve the prediction significantly and the authors argue that the main reason is probably the uncertainty in propagation of these component. However, Nemecek et al. (2020) have shown that there is a systematic deflection of the solar wind from the radial direction in the fast wind and application of this finding can further improve the prediction. This point would be discussed.
Lines 254 and 255 – I would suggest to rephrase the sentence, because the quantities like n_estimator or learning_rate are specific for the software used and they are not necessarily clear for readers that are not familiar with ML techniques.
Line 329 – The authors probably have in mind “spatial coverage”
References:
Němeček, Z; Ďurovcová, T; Šafránková, J; Richardson, JD; Šimůnek, J; Stevens, ML, (Non)radial Solar Wind Propagation through the Heliosphere, Astrophys. J. Lett., 897 (2): Art. No. L39, 2020.
Safrankova, J; Nemecek, Z; Dusik, S; Prech, L; Sibeck, DG; Borodkova, NN, The magnetopause shape and location: a comparison of the Interball and Geotail observations with models, Ann. Geophys., 20 (3): 301–309, 2002.

Citation: https://doi.org/10.5194/egusphere-2025-4530-RC1
- AC1:
  'Reply on RC1', Lars Klingenstein, 07 Nov 2025
  
  We would like to express gratitude to anonymous referee no. 1 for reviewing the preprint and their contribution to quality control. Their comments are highly appreciated and of great value to improve the scientific value of the paper.
  Line 24 – the references to models that explicitly use Bz as a parameter is incomplete, I suggest to add “for example” to the brackets with references.
  Thank you for pointing that out, “e.g.” was added to the citations.
  
  Line 42 – I suggest discarding the sentence about Sh98 model starting in this line and continue the text.
  The sentence is removed for a better overall flow.
  
  Line 125 – The formula uses only the Earth orbital motion, but the aberration depends on the perpendicular solar wind components. Its true that analysis in Safrankova et al. (2002) revealed that the application of propagated values of perpendicular components does not improve the prediction significantly and the authors argue that the main reason is probably the uncertainty in propagation of these component. However, Nemecek et al. (2020) have shown that there is a systematic deflection of the solar wind from the radial direction in the fast wind and application of this finding can further improve the prediction. This point would be discussed.
  Thank you for this interesting comment. We added some sentences starting in line 126 and 129 discussing the remark. The paragraph reads as follows:
  
  “Equation 4 is a simplification of the exact formula ψ=arctan(v_⊥/v_x), where v_⊥=v_y+v_E. Since the v_x component dominates the solar wind direction, v_x≈v can be assumed safely. The v_y component however is often not small compared to v_E, especially not in the fast solar wind where the flow deflection has a median value of 18 km s^-1 (Nemecek et al., 2020a). Including the v_y component would increase the absolute aberration by almost two degrees which especially effects the position of the MPCs on the nightside. In the subsolar region, the aberration of the position is neglected since the effect is smaller and the absolute value of the magnetopause distance, which is unchanged by the aberration correction, is more important for our model than the exact position of the MPC. By rotating coordinate sensitive parameters, such as the position and the IMF, by ψ around the z-axis, the system is transformed into aberrated GSE (AGSE) coordinates. Again, the transformation is done for each MPC separately with its respective aberration angle. Derived quantities (like the clock and cone angles, the zenith angle, etc.) are then recomputed in AGSE format. A comparison of the AGSE IMF components derived with this method to those obtained when the aberration includes the v_y component (see above) shows discrepancies of only a few tenths of a nanotesla. We argue that that the uncertainty in the IMF data is larger than this deviation and therefore use the simplified equation for the aberration. Including v_y could be considered in future studies to account for the aforementioned effects more precisely.”
  
  Lines 254 and 255 – I would suggest to rephrase the sentence, because the quantities like n_estimator or learning_rate are specific for the software used and they are not necessarily clear for readers that are not familiar with ML techniques.
  We have added an explanation that specifies what the hyperparameter names mean in a more general setting, not exclusive to the used software. The sentence reads:
  
  “All but three hyperparameters of the XGB model are kept at their default values. The adjusted hyperparameters are the number of trees in the random forest (n_estimators = 1000), the depth, i.e. number of decisions, of each tree (max_depth = 6), and the contribution of each tree, which also influences how conservative the model is and how it generalizes (learning_rate = 0.05).”
  
  Line 329 – The authors probably have in mind “spatial coverage”
  That is right, the typo is corrected.
  
  References:
  
  Němeček, Z; Ďurovcová, T; Šafránková, J; Richardson, JD; Šimůnek, J; Stevens, ML, (Non)radial Solar Wind Propagation through the Heliosphere, Astrophys. J. Lett., 897 (2): Art. No. L39, 2020.
  
  Safrankova, J; Nemecek, Z; Dusik, S; Prech, L; Sibeck, DG; Borodkova, NN, The magnetopause shape and location: a comparison of the Interball and Geotail observations with models, Ann. Geophys., 20 (3): 301–309, 2002.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4530-AC1
  - RC3: 'Reply on AC1', Anonymous Referee #1, 07 Nov 2025
    
    I am fully satisfied with the corrections made by authors and I am happy that I can recommend the article for publication.
    
    Citation: https://doi.org/10.5194/egusphere-2025-4530-RC3
    
    AC3: 'Reply on RC3', Lars Klingenstein, 10 Nov 2025
    
    Thank you very much for the recommendation and also once again for the valuable comments.
    
    Citation: https://doi.org/10.5194/egusphere-2025-4530-AC3
RC2:
'Comment on egusphere-2025-4530', Anonymous Referee #2, 28 Oct 2025

The manuscript presents a valuable contribution to the study of the Earth’s magnetopause by developing a new data-driven parameterization of the subsolar standoff distance. Using an extensive dataset of over 220,000 magnetopause crossings from THEMIS (2007–2022) and Cluster (2001–2020), the authors benchmark several established empirical models and then apply modern machine learning methods (XGBoost, SVR, and SHAP interpretation) to identify key controlling parameters and develop improved predictive models. Interestingly, they find that geomagnetic indices (SYM-H and AE) are more influential than IMF magnitude and Bz when included jointly, highlighting that geomagnetic activity may encapsulate some of the physical effects traditionally attributed to the IMF. The final SVR and second-order polynomial models both show improved accuracy compared to existing parameterizations, and the authors carefully discuss their limitations under extreme conditions.

Strengths: The manuscript addresses an important and long-standing issue in magnetospheric physics: producing an accurate yet easily implementable model of the magnetopause. The authors make sound methodological choices, applying established techniques for magnetopause classification and subsolar data reduction. The use of SHAP values provides a clear and interpretable assessment of variable importance, strengthening confidence in the physical consistency of the results. The paper is clearly written, logically structured, and effectively contextualizes the results within the broader literature. The inclusion of explicit validity domains and discussion of limitations (e.g., performance under CME conditions and challenges in cusp regions) demonstrates scientific rigor and transparency.
I recommend the manuscript for publication with a suggested minor revision:
The manuscript refers to a second-order polynomial representation of the subsolar standoff distance but does not provide its explicit formula. I think that including a mathematical expression for the second-order polynomial model would greatly enhance accessibility and allow other researchers to readily implement the model.

Citation: https://doi.org/10.5194/egusphere-2025-4530-RC2
- AC2: 'Reply on RC2', Lars Klingenstein, 07 Nov 2025
  
  We would like to express gratitude to anonymous referee no. 2 for reviewing the preprint and their contribution to quality control. Their comments are highly appreciated and of great value to improve the scientific value of the paper.
  
  The manuscript refers to a second-order polynomial representation of the subsolar standoff distance but does not provide its explicit formula. I think that including a mathematical expression for the second-order polynomial model would greatly enhance accessibility and allow other researchers to readily implement the model.
  Thank you for the comment, we included the formula for the non-reduced polynomial model as equation (B1). r₀ is given in units of R_E, parameters are without units since they are expected to be scaled as explained in the text before they are used in the formula. Table B1 has been removed since the added written out formula provides the same information and the table would be redundant. Appendix B has been renamed as well since it now contains the model equation.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4530-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Publish subject to revisions (further review by editor and referees) (20 Nov 2025) by Oliver Allanson

AR by Lars Klingenstein on behalf of the Authors (20 Nov 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (20 Nov 2025) by Oliver Allanson

RR by Anonymous Referee #1 (20 Nov 2025)

ED: Publish as is (08 Dec 2025) by Oliver Allanson

AR by Lars Klingenstein on behalf of the Authors (09 Dec 2025)

Journal article(s) based on this preprint

15 Dec 2025

Parameterization of the subsolar standoff distance of Earth's magnetopause based on results from machine learning

Lars Klingenstein, Niklas Grimmich, Yuri Y. Shprits, Adrian Pöppelwerth, and Ferdinand Plaschke

Ann. Geophys., 43, 835–854, https://doi.org/10.5194/angeo-43-835-2025,https://doi.org/10.5194/angeo-43-835-2025, 2025

Short summary

Lars Klingenstein, Niklas Grimmich, Yuri Shprits, Adrian Pöppelwerth, and Ferdinand Plaschke

Viewed

Total article views: 1,002 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
821	149	32	1,002	28	29

HTML: 821
PDF: 149
XML: 32
Total: 1,002
BibTeX: 28
EndNote: 29

Views and downloads (calculated since 25 Sep 2025)

Month	HTML	PDF	XML	Total
Sep 2025	521	12	4	537
Oct 2025	183	38	8	229
Nov 2025	95	55	16	166
Dec 2025	22	44	4	70
Jan 2026	0

Cumulative views and downloads (calculated since 25 Sep 2025)

Month	HTML	PDF	XML	Total
Sep 2025	521	12	4	537
Oct 2025	183	38	8	229
Nov 2025	95	55	16	166
Dec 2025	22	44	4	70
Jan 2026	0

Viewed (geographical distribution)

Total article views: 995 (including HTML, PDF, and XML) Thereof 995 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 12 Jan 2026

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (1240 KB)
Metadata XML

Short summary

We applied machine learning to investigate how the solar wind and Earth's geomagnetic activity control the position of the magnetopause, the boundary layer of Earth's magnetic field. Our results demonstrate that geomagnetic activity strongly influences this boundary and should be incorporated in predictive models. Using data from multiple spacecraft, we developed a simple mathematical description of the magnetopause distance that improves understanding of solar wind–magnetosphere interactions.


Total:	0
HTML:	0
PDF:	0
XML:	0