Enhancing Data-Driven Weather Forecasting via Gated Relative Position Encoding and Spatial-Aware Feed-Forward Network
Abstract. Data-driven weather models have emerged to address the immense computational costs of traditional numerical weather prediction by generating highly accurate, global forecasts in seconds. While Transformer-based architectures have achieved higher accuracy than numerical weather predictions, their existing position encodings typically embed limited spatial and temporal context, failing to fully account for the time variability, directionality, and location-dependency inherent in atmospheric motions. To resolve this, we introduce a novel model, Neighborhood Attention Transformer for atmospheric prediction (AtmoNAT). We propose two unique architectural components: a Gated Relative Position Encoding (GRPE) and a Spatial-Aware Feed-Forward Network (SAFN). The GRPE maintains independent positional biases based on absolute coordinates to secure location-dependency with a negligible increase in model size, while effectively capturing the directionality and temporal variations of the atmosphere. Simultaneously, the SAFN incorporates parallel input and gating branches, alongside a global positional bias, to explicitly simulate non-local interactions between atmospheric variables and integrate terrain effects. Evaluated on the WeatherBench 2 data at a 1.5° spatial resolution, AtmoNAT’s deterministic forecasts demonstrate lower prediction errors on key variables up to a 72-hour lead time when compared to other coarse-resolution ensemble forecasts. Furthermore, AtmoNAT achieves state-of-the-art forecasting performance over global land areas, highlighting the profound potential of GRPE and SAFN in advancing next-generation weather forecasting.