Optimizing output operations in high-resolution climate models through dynamic scheduling
Abstract. This study presents a new approach to improve the efficiency of data output in high-resolution climate models. The method begins by forwarding data to processes with lighter workloads or finishing their tasks earlier, allowing these units to serve as temporary storage. Following this, the processes create multiple smaller communication groups to reorganize the data and then use an I/O aggregation approach to enable efficient parallel writing. A dedicated control process dynamically manages these phases based on the status of each process. To further refine the I/O strategies, we collect performance data from the target machine to build a simulated environment. A reinforcement learning agent is deployed in this environment to identify and test better parameter configurations. Experiments conducted on two models, GOMO1.0 and LICOM3, show that this method increases output efficiency by factors of 1.54 and 13.1, respectively, compared to the commonly used PnetCDF and MPI-IO. These results suggest that this approach can significantly reduce the overhead associated with data output, providing a promising solution for enhancing the performance of climate models.