Architectural Insights and Training Methodology Optimization of Pangu-Weather

To, Deifilia Aurora; Quinting, Julian; Hoshyaripour, Gholam Ali; Götz, Markus; Streit, Achim; Debus, Charlotte

doi:https://doi.org/10.5194/egusphere-2024-1714

Deifilia Aurora To, Julian Quinting, Gholam Ali Hoshyaripour, Markus Götz, Achim Streit, and Charlotte Debus

Abstract. Data-driven medium-range weather forecasts have recently outperformed classical numerical weather prediction models, with Pangu-Weather (PGW) being the first breakthrough model to achieve this. The Transformer-based PGW introduced novel architectural components including the three-dimensional attention mechanism (3D-Transformer) in the Transformer blocks and an Earth-specific positional bias term which accounts for weather states being related to the absolute position on Earth. However, the effectiveness of different architectural components is not yet well understood. Here, we reproduce the 24-hour forecast model of PGW based on subsampled 6-hourly data. We then present an ablation study of PGW to better understand the sensitivity to the model architecture and training procedure. We find that using a two-dimensional attention mechanism (2D-Transformer) yields a model that is more robust to training, converges faster, and produces better forecasts than with the 3D-Transformer. The 2D-Transformer reduces the overall computational requirements by 20–30 %. Further, the Earth-specific positional bias term can be replaced with a relative bias, reducing the model size by nearly 40 %. A sensitivity study comparing the convergence of the PGW model and the 2D-Transformer model shows large batch effects: however, the 2D-Transformer model is more robust to such effects. Lastly, we propose a new training procedure that increases the speed of convergence for the 2D-Transformer model model by 30 % without any further hyperparameter tuning.

Received: 05 Jun 2024 – Discussion started: 25 Jun 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 1053 KB)

Supplement (3629 KB)

Download & links

Journal article(s) based on this preprint

13 Dec 2024

Architectural insights into and training methodology optimization of Pangu-Weather

Deifilia To, Julian Quinting, Gholam Ali Hoshyaripour, Markus Götz, Achim Streit, and Charlotte Debus

Geosci. Model Dev., 17, 8873–8884, https://doi.org/10.5194/gmd-17-8873-2024,https://doi.org/10.5194/gmd-17-8873-2024, 2024

Short summary

Country	#	Views	%
United States of America	1	267	32
Germany	2	156	18
China	3	86	10
United Kingdom	4	36	4
undefined	5	24	2


Total:	0
HTML:	0
PDF:	0
XML:	0

Architectural Insights and Training Methodology Optimization of Pangu-Weather

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Journal article(s) based on this preprint

Supplement

Viewed

Viewed (geographical distribution)