SimTA: A dual-polarization SAR time series rice mapping model based on deep feature-level fusion and spatio-temporal attention
Abstract. Accurate large-scale crop mapping is critical for yield prediction, agricultural disaster monitoring, and global food security. Synthetic Aperture Radar (SAR), with its all-weather, day-and-night imaging capability, plays a vital role in remote sensing based crop mapping studies. However, most existing studies fuse VV and VH polarization channels at the data level, overlooking channels' differences in signal-to-noise characteristics and temporal dynamics, which results in rice feature redundancy or conflicts, particularly at rice field edges and in heterogeneous regions, thereby increasing misclassifications error. To address these challenges, this study proposes a novel Spatiotemporal Attention Model (SimTA) for rice mapping. (1) A VV-VH feature-level fusion scheme is designed, integrated with a Content-Guided Attention (CGA) fusion method which effectively exploits the complementary information of the dual-polarized SAR data for achieving deep spatiotemporal dynamics fusion. (2) A Central Difference Convolution Spatial Extraction Conv (CDCSE Conv) Block is designed, enhancing sensitivity to edge variations of rice field by combining standard and central difference convolutions. (3) To achieve efficient spatiotemporal feature integration across SAR time series, a Temporal-Spatial Attention (TSA) Block is developed, utilizing large-kernel convolutions for spatial feature extraction and a squeeze-and-excitation mechanism for capturing long-range temporal dependencies of rice time series. Extensive experiments were conducted by comparing SimTA with different models under five fusion schemes. Results demonstrate that feature-level fusion consistently outperforms other schemes, with SimTA achieving the best performance: OA = 91.1 %, F1 Score = 90.9 %, and mIoU = 86.2 %. Compared to the baseline SimVP, SimTA improves F1 Score and mIoU by 0.8 % and 2.1 %, respectively. The CGA enhanced feature-level fusion further boosts SimTA's performance to OA = 91.5 % and F1 = 91.4 %. SimTA bridges the gap between existing VV-VH deep fusion schemes and modern spatiotemporal modeling demands, offering a more accurate and generalizable approach for large-scale rice mapping.