Preprints
https://doi.org/10.5194/egusphere-2024-1714
https://doi.org/10.5194/egusphere-2024-1714
25 Jun 2024
 | 25 Jun 2024

Architectural Insights and Training Methodology Optimization of Pangu-Weather

Deifilia Aurora To, Julian Quinting, Gholam Ali Hoshyaripour, Markus Götz, Achim Streit, and Charlotte Debus

Abstract. Data-driven medium-range weather forecasts have recently outperformed classical numerical weather prediction models, with Pangu-Weather (PGW) being the first breakthrough model to achieve this. The Transformer-based PGW introduced novel architectural components including the three-dimensional attention mechanism (3D-Transformer) in the Transformer blocks and an Earth-specific positional bias term which accounts for weather states being related to the absolute position on Earth. However, the effectiveness of different architectural components is not yet well understood. Here, we reproduce the 24-hour forecast model of PGW based on subsampled 6-hourly data. We then present an ablation study of PGW to better understand the sensitivity to the model architecture and training procedure. We find that using a two-dimensional attention mechanism (2D-Transformer) yields a model that is more robust to training, converges faster, and produces better forecasts than with the 3D-Transformer. The 2D-Transformer reduces the overall computational requirements by 20–30 %. Further, the Earth-specific positional bias term can be replaced with a relative bias, reducing the model size by nearly 40 %. A sensitivity study comparing the convergence of the PGW model and the 2D-Transformer model shows large batch effects: however, the 2D-Transformer model is more robust to such effects. Lastly, we propose a new training procedure that increases the speed of convergence for the 2D-Transformer model model by 30 % without any further hyperparameter tuning.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Journal article(s) based on this preprint

13 Dec 2024
Architectural insights into and training methodology optimization of Pangu-Weather
Deifilia To, Julian Quinting, Gholam Ali Hoshyaripour, Markus Götz, Achim Streit, and Charlotte Debus
Geosci. Model Dev., 17, 8873–8884, https://doi.org/10.5194/gmd-17-8873-2024,https://doi.org/10.5194/gmd-17-8873-2024, 2024
Short summary
Deifilia Aurora To, Julian Quinting, Gholam Ali Hoshyaripour, Markus Götz, Achim Streit, and Charlotte Debus

Interactive discussion

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • CEC1: 'Comment on egusphere-2024-1714', Juan Antonio Añel, 07 Jul 2024
    • AC1: 'Reply on CEC1', Deifilia To, 25 Jul 2024
  • RC1: 'Comment on egusphere-2024-1714', Tobias Weigel, 17 Jul 2024
    • AC2: 'Reply on RC1', Deifilia To, 08 Aug 2024
      • RC5: 'Reply on AC2', Tobias Weigel, 12 Aug 2024
        • AC5: 'Reply on RC5', Deifilia To, 20 Sep 2024
  • RC2: 'Comment on egusphere-2024-1714', Anonymous Referee #2, 18 Jul 2024
    • AC3: 'Reply on RC2', Deifilia To, 08 Aug 2024
      • RC4: 'Reply on AC3', Anonymous Referee #2, 09 Aug 2024
        • AC6: 'Reply on RC4', Deifilia To, 20 Sep 2024
  • RC3: 'Comment on egusphere-2024-1714', Anonymous Referee #3, 23 Jul 2024
    • AC4: 'Reply on RC3', Deifilia To, 08 Aug 2024

Interactive discussion

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • CEC1: 'Comment on egusphere-2024-1714', Juan Antonio Añel, 07 Jul 2024
    • AC1: 'Reply on CEC1', Deifilia To, 25 Jul 2024
  • RC1: 'Comment on egusphere-2024-1714', Tobias Weigel, 17 Jul 2024
    • AC2: 'Reply on RC1', Deifilia To, 08 Aug 2024
      • RC5: 'Reply on AC2', Tobias Weigel, 12 Aug 2024
        • AC5: 'Reply on RC5', Deifilia To, 20 Sep 2024
  • RC2: 'Comment on egusphere-2024-1714', Anonymous Referee #2, 18 Jul 2024
    • AC3: 'Reply on RC2', Deifilia To, 08 Aug 2024
      • RC4: 'Reply on AC3', Anonymous Referee #2, 09 Aug 2024
        • AC6: 'Reply on RC4', Deifilia To, 20 Sep 2024
  • RC3: 'Comment on egusphere-2024-1714', Anonymous Referee #3, 23 Jul 2024
    • AC4: 'Reply on RC3', Deifilia To, 08 Aug 2024

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload
AR by Deifilia To on behalf of the Authors (09 Sep 2024)  Author's response   Author's tracked changes   Manuscript 
ED: Publish subject to minor revisions (review by editor) (19 Sep 2024) by Lele Shu
AR by Deifilia To on behalf of the Authors (20 Sep 2024)  Author's response   Author's tracked changes   Manuscript 
ED: Publish as is (13 Oct 2024) by Lele Shu
AR by Deifilia To on behalf of the Authors (21 Oct 2024)

Journal article(s) based on this preprint

13 Dec 2024
Architectural insights into and training methodology optimization of Pangu-Weather
Deifilia To, Julian Quinting, Gholam Ali Hoshyaripour, Markus Götz, Achim Streit, and Charlotte Debus
Geosci. Model Dev., 17, 8873–8884, https://doi.org/10.5194/gmd-17-8873-2024,https://doi.org/10.5194/gmd-17-8873-2024, 2024
Short summary
Deifilia Aurora To, Julian Quinting, Gholam Ali Hoshyaripour, Markus Götz, Achim Streit, and Charlotte Debus
Deifilia Aurora To, Julian Quinting, Gholam Ali Hoshyaripour, Markus Götz, Achim Streit, and Charlotte Debus

Viewed

Total article views: 849 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
585 161 103 849 30 14 13
  • HTML: 585
  • PDF: 161
  • XML: 103
  • Total: 849
  • Supplement: 30
  • BibTeX: 14
  • EndNote: 13
Views and downloads (calculated since 25 Jun 2024)
Cumulative views and downloads (calculated since 25 Jun 2024)

Viewed (geographical distribution)

Total article views: 822 (including HTML, PDF, and XML) Thereof 822 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 13 Dec 2024
Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Short summary
Pangu-Weather is a breakthrough machine learning model in medium-range weather forecasting that considers three-dimensional atmospheric information. We show that using a simpler 2D framework improves robustness, speeds up training, and reduces computational needs by 20–30%. We introduce a training procedure that varies the importance of atmospheric variables over time to speed up training convergence. Decreasing computational demand increases accessibility of training and working with the model.