the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Application of regional meteorology and air quality models based on MIPS CPU Platform
Abstract. The MIPS processor architecture is a type of Reduced Instruction Set Computing (RISC) processor architecture, which has advantages in terms of energy consumption and efficiency. There are few studies on the application of MIPS CPUs in the geoscientific numerical models. In this study, Loongson 3A4000 CPU platform with MIPS64 architecture was used to establish the runtime environment for the air quality modelling system WRF-CAMx in Beijing-Tianjin-Hebei region. The results show that the relative errors for the major species (NO2, SO2, O3, CO, PNO3 and PSO4) between the MIPS and X86 benchmark platform are within ± 0.1 %. The maximum Mean Absolute Error (MAE) of major species ranged to 10−2 ppbV or μg m−3, the maximum Root Mean Square Error (RMSE) ranged to 10−1 ppbV or μg m−3, and the Mean Absolute Percentage Error (MAPE) remained within 0.5 %. The CAMx takes about 15.2 minutes on Loongson 3A4000 CPU and 4.8 minutes on Intel Xeon E5-2697 v4 CPU, when simulating a 2h-case with four parallel processes using MPICH. As a result, the single-core computing capability of Loongson 3A4000 CPU for the WRF-CAMx modeling system is about one-third of Intel Xeon E5-2697 v4 CPU, but the thermal design power (TDP) of Loongson 3A4000 is 30W, only about one-fifth of Intel Xeon E5-2697 v4, which TDP is 145W. Thus, Loongson 3A4000 has higher energy efficiency in the application of the WRF-CAMx modeling system. The results also verify the feasibility of cross-platform porting and the scientific usability of the ported model. This study provides a technical foundation for the porting and optimization of numerical models based on MIPS or other RISC platforms.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(3830 KB)
-
Supplement
(350 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(3830 KB) - Metadata XML
-
Supplement
(350 KB) - BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-2962', Anonymous Referee #1, 08 Jan 2024
Overall this article provides an interesting study to show case how a given system compares between different platforms. This work validate the robustness of the models.
There are some small expressions that needs clarify:
1. 3A4000 CPU works at 1.8-2.0GHz, it seems that the specific platform used for experiments are 1.8GHz. So when citing the power comsumption number, 40w instead of 30w should be used to be fair; and this statement in the Conclusion section also needs a correction: "The platform used in this study is Loongson 3A4000 quad-core 2.0GHz CPU,
497 offering a peak operational speed of 128GFlops"2. The LoongArch architecture is not direct compatible with MIPS architecture. But Loongson does provide a binary translation software to run MIPS software with small performance loss.
Citation: https://doi.org/10.5194/egusphere-2023-2962-RC1 - AC1: 'Reply on RC1', Qizhong Wu, 31 Jan 2024
-
RC2: 'Comment on egusphere-2023-2962', Anonymous Referee #2, 09 Jan 2024
This manuscript summarizes an exercise in porting the WRF-CAMx modeling system to a specific computational platform, namely Loongson 3A4000 CPU platform with MIPS64 architecture. Model simulations of evolution of air pollution over a period of 72-hour duration over a domain encompassing the Beijing-Tianjin-Hebei region are conducted on this platform and benchmarked against comparable simulations on a X86 platform. Additional simulations for a much shorter 2h period with CAMx are also conducted to examine parallel performance on the different architectures. The authors present a variety of standard statistical measures to demonstrate the cross-platform porting of the WRF-CMAx to the Longsoon 3A4000 platform they use. The manuscript is generally well written and clearly describes the work conducted by the authors. However, in my assessment the manuscript lacks scientific novelty and does little to advance either the development and evaluation of the modeling systems examined or in providing a robust assessment of the execution of these models on emerging architectures. In my view, much of what is presented, is a standard exercise in porting a numerical modeling system to a different computational platform and steps that are routinely undertaken to establish benchmarking on such systems – these tests are commonplace for both the WRF and the CAMx models used here as well as other air pollution modeling systems and are routinely conducted by their respective user communities. While the successful porting of these specific models (WRF v4.0 and CAMxv6.10) to a specific platform (Loongson 3A4000 CPU) may likely be of interest to a segment of the users of these models who may also be planning to use these specific MIPS CPU platforms, such assessments are commonly discussed in the user forums of these (and similar) modeling systems – I thus struggle to identify the scientific and technical uniqueness of this contribution.
No new developments to either the WRF or CAMx models are described, neither were any changes implemented to the respective model codes to improve their computational performance on the architectures examined. Rather changes were made to configuration/makefiles to facilitate the compiling of the model codes, which is somewhat standard practice whenever a model code is ported across platforms or when compilers are updated.
It could be argued that running and porting of models across platforms and establishing the “reproducibility” of results through the benchmarking described falls under the scope of “development and technical papers”, but there too the simulation durations and domain coverage are somewhat limited to clearly assess all technical aspects of running the models on the new architecture.
At several places in the manuscript discussion, it is mentioned that MIPS architectures and the Loongson 3A4000 have the advantage of energy efficiency. However, the simulation design (domain size and simulation duration) does not appear to lend itself to adequately assess potential energy savings. Neither is any analyses presented to robustly infer the tradeoff between computational performance (since that seems to be poorer for the MIPS system used here relative to the X86 platform) and energy savings that may result from transitioning to such a platform.
L113-115: This statement implies that the WRF-CAMx modeling system was developed in Xi’an, China and Milan, Europe – is that an accurate representation of the origin of these models or their linkage? Did Ramboll not develop the requisite code to link CAMx with WRF output?
L303: “stability and availability” should be clearly defined. Is a single 72-hour simulation duration sufficiently long to test the stability of a model on an architecture?
L456-457: How does the parallel performance of CAMx vary with problem size, i.e., number of grid cells? What fraction of the time is spent in output operations? Is it possible that with increasing computational size, a single processor would require more time than the configuration with two with one dedicated to I/O? How generalizable are the findings on parallel performance based on the limited domain size and simulation period?
L468-474: It is interesting that the performance of the MIPS platform decreased significantly when the number of parallel processes exceed 4. Is this unique to the Loongson 3A4000 or is this generalizable to other MIPS systems? Would the same hold for a domain with significantly larger number of grid cells?
Citation: https://doi.org/10.5194/egusphere-2023-2962-RC2 - AC2: 'Reply on RC2', Qizhong Wu, 02 Feb 2024
-
RC3: 'Comment on egusphere-2023-2962', Anonymous Referee #3, 16 Jan 2024
General comments
The research paper focuses on the utilization of MIPS CPUs, particularly the Loongson 3A4000 CPU platform, in air quality prediction models. It evaluates the performance of the WRF-CAMx air quality modelling system in the Beijing-Tianjin-Hebei region using this platform. The study compares the MIPS CPU platform's performance with a benchmark X86 platform, analyzing various aspects like relative errors for major species, computational efficiency, and energy consumption. The results indicate the feasibility and efficiency of using MIPS architecture for such applications.
This work has the potential to offer valuable guidance for using the MIPS platform for geoscientific modeling. I would suggest that the authors provide a more in-depth discussion on how to exploit the advantages of the MIPS platform. The structure of the paper could be improved, and additional tests are necessary to demonstrate the MIPS platform's performance.
Major comments and questions
- Every abbreviation that appears in the paper, including in the abstract and the main body, should be spelled out in full the first time it is used. For example, the abbreviations 'MIPS' and 'WRF-CAMx' are not spelled out in the abstract.
- The model setups and analysis methods used in this paper should be presented prior to the results in Section 4. The content in Lines 305-309, 323-325, and 404-411 should be consolidated in Sections 2 or 3 as part of the methodology.
- I am curious about the number of sockets available on the motherboard for the Loongson 3A4000 platform. Could the author conduct a larger-scale comparison using more Loongson 3A4000 CPUs compared to the X86 platform as shown in Figure 9?
- Could the author investigate the impact of using different compilers or different compiler parameters on computational performance, in addition to the GNU?
Minor comments:
Line 92: Remove “The remainder is organized as follows.”
Line 113-115: Rephrase this sentence. The WRF is developed by NCAR, and CAMx is developed by Rambell. WRF-CAMx is just applied in Xi’an, China and Milan, Italy (not Europe).
Line 123-126: The introduction for WRF is not professional. WRF is a meso-scale meteorology model, and you can use it for weather research and prediction. It can be used with a data assimilation technique, and testing its parameterization schemes is a way to improve WRF.
Line 152-154: Why you used 14 layers not original 34 layers?
Line 195-196: Was FFT used or related to this paper? If not, please remove it.
Line 436: Give full names to RMSE, std. Why std are in lowercase but RMSE is not? Also, what’s the statistic meaning or purpose of using the ratio of RMSE/STD?
Citation: https://doi.org/10.5194/egusphere-2023-2962-RC3 - AC3: 'Reply on RC3', Qizhong Wu, 22 Feb 2024
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-2962', Anonymous Referee #1, 08 Jan 2024
Overall this article provides an interesting study to show case how a given system compares between different platforms. This work validate the robustness of the models.
There are some small expressions that needs clarify:
1. 3A4000 CPU works at 1.8-2.0GHz, it seems that the specific platform used for experiments are 1.8GHz. So when citing the power comsumption number, 40w instead of 30w should be used to be fair; and this statement in the Conclusion section also needs a correction: "The platform used in this study is Loongson 3A4000 quad-core 2.0GHz CPU,
497 offering a peak operational speed of 128GFlops"2. The LoongArch architecture is not direct compatible with MIPS architecture. But Loongson does provide a binary translation software to run MIPS software with small performance loss.
Citation: https://doi.org/10.5194/egusphere-2023-2962-RC1 - AC1: 'Reply on RC1', Qizhong Wu, 31 Jan 2024
-
RC2: 'Comment on egusphere-2023-2962', Anonymous Referee #2, 09 Jan 2024
This manuscript summarizes an exercise in porting the WRF-CAMx modeling system to a specific computational platform, namely Loongson 3A4000 CPU platform with MIPS64 architecture. Model simulations of evolution of air pollution over a period of 72-hour duration over a domain encompassing the Beijing-Tianjin-Hebei region are conducted on this platform and benchmarked against comparable simulations on a X86 platform. Additional simulations for a much shorter 2h period with CAMx are also conducted to examine parallel performance on the different architectures. The authors present a variety of standard statistical measures to demonstrate the cross-platform porting of the WRF-CMAx to the Longsoon 3A4000 platform they use. The manuscript is generally well written and clearly describes the work conducted by the authors. However, in my assessment the manuscript lacks scientific novelty and does little to advance either the development and evaluation of the modeling systems examined or in providing a robust assessment of the execution of these models on emerging architectures. In my view, much of what is presented, is a standard exercise in porting a numerical modeling system to a different computational platform and steps that are routinely undertaken to establish benchmarking on such systems – these tests are commonplace for both the WRF and the CAMx models used here as well as other air pollution modeling systems and are routinely conducted by their respective user communities. While the successful porting of these specific models (WRF v4.0 and CAMxv6.10) to a specific platform (Loongson 3A4000 CPU) may likely be of interest to a segment of the users of these models who may also be planning to use these specific MIPS CPU platforms, such assessments are commonly discussed in the user forums of these (and similar) modeling systems – I thus struggle to identify the scientific and technical uniqueness of this contribution.
No new developments to either the WRF or CAMx models are described, neither were any changes implemented to the respective model codes to improve their computational performance on the architectures examined. Rather changes were made to configuration/makefiles to facilitate the compiling of the model codes, which is somewhat standard practice whenever a model code is ported across platforms or when compilers are updated.
It could be argued that running and porting of models across platforms and establishing the “reproducibility” of results through the benchmarking described falls under the scope of “development and technical papers”, but there too the simulation durations and domain coverage are somewhat limited to clearly assess all technical aspects of running the models on the new architecture.
At several places in the manuscript discussion, it is mentioned that MIPS architectures and the Loongson 3A4000 have the advantage of energy efficiency. However, the simulation design (domain size and simulation duration) does not appear to lend itself to adequately assess potential energy savings. Neither is any analyses presented to robustly infer the tradeoff between computational performance (since that seems to be poorer for the MIPS system used here relative to the X86 platform) and energy savings that may result from transitioning to such a platform.
L113-115: This statement implies that the WRF-CAMx modeling system was developed in Xi’an, China and Milan, Europe – is that an accurate representation of the origin of these models or their linkage? Did Ramboll not develop the requisite code to link CAMx with WRF output?
L303: “stability and availability” should be clearly defined. Is a single 72-hour simulation duration sufficiently long to test the stability of a model on an architecture?
L456-457: How does the parallel performance of CAMx vary with problem size, i.e., number of grid cells? What fraction of the time is spent in output operations? Is it possible that with increasing computational size, a single processor would require more time than the configuration with two with one dedicated to I/O? How generalizable are the findings on parallel performance based on the limited domain size and simulation period?
L468-474: It is interesting that the performance of the MIPS platform decreased significantly when the number of parallel processes exceed 4. Is this unique to the Loongson 3A4000 or is this generalizable to other MIPS systems? Would the same hold for a domain with significantly larger number of grid cells?
Citation: https://doi.org/10.5194/egusphere-2023-2962-RC2 - AC2: 'Reply on RC2', Qizhong Wu, 02 Feb 2024
-
RC3: 'Comment on egusphere-2023-2962', Anonymous Referee #3, 16 Jan 2024
General comments
The research paper focuses on the utilization of MIPS CPUs, particularly the Loongson 3A4000 CPU platform, in air quality prediction models. It evaluates the performance of the WRF-CAMx air quality modelling system in the Beijing-Tianjin-Hebei region using this platform. The study compares the MIPS CPU platform's performance with a benchmark X86 platform, analyzing various aspects like relative errors for major species, computational efficiency, and energy consumption. The results indicate the feasibility and efficiency of using MIPS architecture for such applications.
This work has the potential to offer valuable guidance for using the MIPS platform for geoscientific modeling. I would suggest that the authors provide a more in-depth discussion on how to exploit the advantages of the MIPS platform. The structure of the paper could be improved, and additional tests are necessary to demonstrate the MIPS platform's performance.
Major comments and questions
- Every abbreviation that appears in the paper, including in the abstract and the main body, should be spelled out in full the first time it is used. For example, the abbreviations 'MIPS' and 'WRF-CAMx' are not spelled out in the abstract.
- The model setups and analysis methods used in this paper should be presented prior to the results in Section 4. The content in Lines 305-309, 323-325, and 404-411 should be consolidated in Sections 2 or 3 as part of the methodology.
- I am curious about the number of sockets available on the motherboard for the Loongson 3A4000 platform. Could the author conduct a larger-scale comparison using more Loongson 3A4000 CPUs compared to the X86 platform as shown in Figure 9?
- Could the author investigate the impact of using different compilers or different compiler parameters on computational performance, in addition to the GNU?
Minor comments:
Line 92: Remove “The remainder is organized as follows.”
Line 113-115: Rephrase this sentence. The WRF is developed by NCAR, and CAMx is developed by Rambell. WRF-CAMx is just applied in Xi’an, China and Milan, Italy (not Europe).
Line 123-126: The introduction for WRF is not professional. WRF is a meso-scale meteorology model, and you can use it for weather research and prediction. It can be used with a data assimilation technique, and testing its parameterization schemes is a way to improve WRF.
Line 152-154: Why you used 14 layers not original 34 layers?
Line 195-196: Was FFT used or related to this paper? If not, please remove it.
Line 436: Give full names to RMSE, std. Why std are in lowercase but RMSE is not? Also, what’s the statistic meaning or purpose of using the ratio of RMSE/STD?
Citation: https://doi.org/10.5194/egusphere-2023-2962-RC3 - AC3: 'Reply on RC3', Qizhong Wu, 22 Feb 2024
Peer review completion
Journal article(s) based on this preprint
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
581 | 79 | 26 | 686 | 26 | 13 | 11 |
- HTML: 581
- PDF: 79
- XML: 26
- Total: 686
- Supplement: 26
- BibTeX: 13
- EndNote: 11
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Cited
Zehua Bai
Yiming Sun
Huaqiong Cheng
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(3830 KB) - Metadata XML
-
Supplement
(350 KB) - BibTeX
- EndNote
- Final revised paper