Enabling High Performance Cloud Computing for the Community Multiscale Air Quality Model (CMAQ) version 5.3.3: Performance Evaluation and Benefits for the User Community

Efstathiou, Christos I.; Adams, Elizabeth; Coats, Carlie J.; Zelt, Robert; Reed, Mark; McGee, John; Foley, Kristen M.; Sidi, Fahim I.; Wong, David C.; Fine, Steven; Arunachalam, Saravanan

doi:https://doi.org/10.5194/egusphere-2023-3045

Preprints

https://doi.org/10.5194/egusphere-2023-3045

Preprints

26 Mar 2024

| 26 Mar 2024

Enabling High Performance Cloud Computing for the Community Multiscale Air Quality Model (CMAQ) version 5.3.3: Performance Evaluation and Benefits for the User Community

Christos I. Efstathiou, Elizabeth Adams, Carlie J. Coats, Robert Zelt, Mark Reed, John McGee, Kristen M. Foley, Fahim I. Sidi, David C. Wong, Steven Fine, and Saravanan Arunachalam

Abstract. The Community Multiscale Air Quality (CMAQ) Model is a local-to-hemispheric scale numerical air quality modeling system developed by the U.S. Environmental Protection Agency (USEPA) and supported by the Center for Community Modeling and Analysis System (CMAS). CMAQ is used for regulatory purposes by the USEPA program offices and state and local air agencies, and is also widely used by the broader global research community to simulate and understand complex air quality processes and for computational environmental fate and transport, and climate and health impact studies. Leveraging state-of-the-science cloud computing resources for high performance computing (HPC) applications, CMAQ is now available as a fully tested, publicly available technology stack (HPC cluster and software stack) for two major cloud service providers (CSPs). Specifically, CMAQ configurations and supporting materials have been developed for use on their HPC clusters, including extensive online documentation, tutorials, and guidelines to scale and optimize air quality simulations using their services. These resources allow modelers to rapidly bring together CMAQ, cloud-hosted datasets, and visualization and evaluation tools on ephemeral clusters that can be deployed quickly and reliably worldwide. Described here are considerations in CMAQ v5.3.3 cloud use and the supported resources for each CSP, presented through a benchmark application suite that was developed as an example of typical simulation for testing and verifying components of the modeling system. The outcomes of this effort are to provide findings from performing CMAQ simulations on the cloud using popular vendor provided resources, to enable the user community to adapt this for their own needs and identify specific areas of potential optimization with respect to storage and compute architectures.

Received: 18 Dec 2023 – Discussion started: 26 Mar 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 4431 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (4431 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

19 Sep 2024

Enabling high-performance cloud computing for the Community Multiscale Air Quality Model (CMAQ) version 5.3.3: performance evaluation and benefits for the user community

Christos I. Efstathiou, Elizabeth Adams, Carlie J. Coats, Robert Zelt, Mark Reed, John McGee, Kristen M. Foley, Fahim I. Sidi, David C. Wong, Steven Fine, and Saravanan Arunachalam

Geosci. Model Dev., 17, 7001–7027, https://doi.org/10.5194/gmd-17-7001-2024,https://doi.org/10.5194/gmd-17-7001-2024, 2024

Short summary

Christos I. Efstathiou, Elizabeth Adams, Carlie J. Coats, Robert Zelt, Mark Reed, John McGee, Kristen M. Foley, Fahim I. Sidi, David C. Wong, Steven Fine, and Saravanan Arunachalam

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-3045', Anonymous Referee #1, 21 Apr 2024

The manuscript introduces the adaptation and optimization of the CMAQ 5.3.3 for high-performance cloud computing environments. This paper fits the scope of GMD and can serve as a detailed reference to showcase how the CMAQ model can enhance computational efficiency and accessibility for diverse modeling tasks. Here are some minor suggestions that could be addressed to further improve the paper.
Line 115-125: How about illustrating the CMAQ workflow using a figure? It would help readers better understand how CMAQ works.
Line 130-150: This is lengthy and somewhat difficult to follow. Please break it into multiple paragraphs to enhance readability.
Line 160: Figure 1 only shows the rectangle of CONUS but lacks grid representation. I suggest exemplifying the grids over an area of interest with a zoom-in minimap.
Line 165-290: Section 3 offers valuable insights into CMAQ deployment from an engineering perspective. However, to align more closely with the scientific paper, consider pivoting towards system or experiment design to elucidate the methodology behind this work, while relocating detailed technical tutorials to an appendix.
Line 335-410: How about combining Figure 6/7, 8/9/10, 11/12/13/14/15/16? It is a little bit hard for readers to compare the results across multiple figures.
Line 495-560: The current discussion could be streamlined and organized into subtopics, such as the strengths of the proposed cloud-based implementation, scalability/reusability, limitations, and future research recommendations. Meanwhile, a conclusion section is recommended to summarize the research findings from this work.

Citation: https://doi.org/10.5194/egusphere-2023-3045-RC1
- AC1: 'Reply on RC1', Saravanan Arunachalam, 20 Jun 2024
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2023-3045/egusphere-2023-3045-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2023-3045-AC1
RC2:
'Comment on egusphere-2023-3045', Anonymous Referee #2, 23 Apr 2024

This manuscript presents a research effort to enable CMAQ modeling and data analysis on high performance cloud computing. CMAQ is a popular air quality model and has been widely applied for numerous regulation and research purposes. The application of CMAQ however, is somehow still limited since it requires preparing all inputs and run scripts on one single server. Enable CMAQ on cloud server would make it more convenient to run the model and would probably promote its applications to a broader community. The study is really a worth of efforts. The manuscript provided clear descriptions of the flow chart and sufficient details of each section of the modeling system, and also demonstrated the changes in cost and efficiency clearly. Therefore, I would recommend it to be accepted for publication, if the following minor comments could be properly addressed.

comment#1: Computational efficiency for traditional CMAQ is not linearly increasing with more CPU cores and data I/O is a big reason. But for cloud-based CMAQ it seems horizontal advection is most time consuming, which is a little surprising. Please provide a brief discussion regarding this change.

comment#2: I guess the current cloud version doesn’t support two-way mode WRF-CMAQ, please clarify if it is correct. Also, does it support online modules for MEGAN and dust emission?

comment#3: Fig.3 and Fig.4 is not mentioned in the main text. It’s necessary to briefly explain the flowchart although the figure is quite self-explained.

comment#4: Fig.8 and Fig.9: It’s interesting to notice that pinning on Azure speeds up vertical diffusion but on AWS slows it, please provide a discussion to briefly explain the difference.

comment#5: It’s important to notice that only a few variables are saved to 1-layer conc file during the test in section4.2.2, while in real application the variables and layers may be much more and larger. Please provide a brief discussion to justify if the test runs shown in this study are representative for typical CMAQ applications.

comment#6: Fig.18, Fig.19, and Fig.21: Showing screen print is straightforward but a little improper for journal publication, it’s better to summarize the important numbers into a concise figure or table.

Citation: https://doi.org/10.5194/egusphere-2023-3045-RC2
- AC2: 'Reply on RC2', Saravanan Arunachalam, 20 Jun 2024
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2023-3045/egusphere-2023-3045-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2023-3045-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-3045', Anonymous Referee #1, 21 Apr 2024

The manuscript introduces the adaptation and optimization of the CMAQ 5.3.3 for high-performance cloud computing environments. This paper fits the scope of GMD and can serve as a detailed reference to showcase how the CMAQ model can enhance computational efficiency and accessibility for diverse modeling tasks. Here are some minor suggestions that could be addressed to further improve the paper.
Line 115-125: How about illustrating the CMAQ workflow using a figure? It would help readers better understand how CMAQ works.
Line 130-150: This is lengthy and somewhat difficult to follow. Please break it into multiple paragraphs to enhance readability.
Line 160: Figure 1 only shows the rectangle of CONUS but lacks grid representation. I suggest exemplifying the grids over an area of interest with a zoom-in minimap.
Line 165-290: Section 3 offers valuable insights into CMAQ deployment from an engineering perspective. However, to align more closely with the scientific paper, consider pivoting towards system or experiment design to elucidate the methodology behind this work, while relocating detailed technical tutorials to an appendix.
Line 335-410: How about combining Figure 6/7, 8/9/10, 11/12/13/14/15/16? It is a little bit hard for readers to compare the results across multiple figures.
Line 495-560: The current discussion could be streamlined and organized into subtopics, such as the strengths of the proposed cloud-based implementation, scalability/reusability, limitations, and future research recommendations. Meanwhile, a conclusion section is recommended to summarize the research findings from this work.

Citation: https://doi.org/10.5194/egusphere-2023-3045-RC1
- AC1: 'Reply on RC1', Saravanan Arunachalam, 20 Jun 2024
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2023-3045/egusphere-2023-3045-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2023-3045-AC1
RC2:
'Comment on egusphere-2023-3045', Anonymous Referee #2, 23 Apr 2024

This manuscript presents a research effort to enable CMAQ modeling and data analysis on high performance cloud computing. CMAQ is a popular air quality model and has been widely applied for numerous regulation and research purposes. The application of CMAQ however, is somehow still limited since it requires preparing all inputs and run scripts on one single server. Enable CMAQ on cloud server would make it more convenient to run the model and would probably promote its applications to a broader community. The study is really a worth of efforts. The manuscript provided clear descriptions of the flow chart and sufficient details of each section of the modeling system, and also demonstrated the changes in cost and efficiency clearly. Therefore, I would recommend it to be accepted for publication, if the following minor comments could be properly addressed.

comment#1: Computational efficiency for traditional CMAQ is not linearly increasing with more CPU cores and data I/O is a big reason. But for cloud-based CMAQ it seems horizontal advection is most time consuming, which is a little surprising. Please provide a brief discussion regarding this change.

comment#2: I guess the current cloud version doesn’t support two-way mode WRF-CMAQ, please clarify if it is correct. Also, does it support online modules for MEGAN and dust emission?

comment#3: Fig.3 and Fig.4 is not mentioned in the main text. It’s necessary to briefly explain the flowchart although the figure is quite self-explained.

comment#4: Fig.8 and Fig.9: It’s interesting to notice that pinning on Azure speeds up vertical diffusion but on AWS slows it, please provide a discussion to briefly explain the difference.

comment#5: It’s important to notice that only a few variables are saved to 1-layer conc file during the test in section4.2.2, while in real application the variables and layers may be much more and larger. Please provide a brief discussion to justify if the test runs shown in this study are representative for typical CMAQ applications.

comment#6: Fig.18, Fig.19, and Fig.21: Showing screen print is straightforward but a little improper for journal publication, it’s better to summarize the important numbers into a concise figure or table.

Citation: https://doi.org/10.5194/egusphere-2023-3045-RC2
- AC2: 'Reply on RC2', Saravanan Arunachalam, 20 Jun 2024
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2023-3045/egusphere-2023-3045-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2023-3045-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Saravanan Arunachalam on behalf of the Authors (20 Jun 2024) Author's response Author's tracked changes Manuscript

ED: Publish subject to minor revisions (review by editor) (01 Jul 2024) by Xiaomeng Huang

ED: Publish as is (02 Jul 2024) by Xiaomeng Huang

AR by Saravanan Arunachalam on behalf of the Authors (13 Jul 2024)

Journal article(s) based on this preprint

19 Sep 2024

Enabling high-performance cloud computing for the Community Multiscale Air Quality Model (CMAQ) version 5.3.3: performance evaluation and benefits for the user community

Christos I. Efstathiou, Elizabeth Adams, Carlie J. Coats, Robert Zelt, Mark Reed, John McGee, Kristen M. Foley, Fahim I. Sidi, David C. Wong, Steven Fine, and Saravanan Arunachalam

Geosci. Model Dev., 17, 7001–7027, https://doi.org/10.5194/gmd-17-7001-2024,https://doi.org/10.5194/gmd-17-7001-2024, 2024

Short summary

Christos I. Efstathiou, Elizabeth Adams, Carlie J. Coats, Robert Zelt, Mark Reed, John McGee, Kristen M. Foley, Fahim I. Sidi, David C. Wong, Steven Fine, and Saravanan Arunachalam

Viewed

Total article views: 457 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
329	102	26	457	14	17

HTML: 329
PDF: 102
XML: 26
Total: 457
BibTeX: 14
EndNote: 17

Views and downloads (calculated since 26 Mar 2024)

Month	HTML	PDF	XML	Total
Mar 2024	91	18	0	109
Apr 2024	110	33	10	153
May 2024	36	12	2	50
Jun 2024	31	11	3	45
Jul 2024	30	13	6	49
Aug 2024	25	11	3	39
Sep 2024	6	4	2	12

Cumulative views and downloads (calculated since 26 Mar 2024)

Month	HTML	PDF	XML	Total
Mar 2024	91	18	0	109
Apr 2024	110	33	10	153
May 2024	36	12	2	50
Jun 2024	31	11	3	45
Jul 2024	30	13	6	49
Aug 2024	25	11	3	39
Sep 2024	6	4	2	12

Viewed (geographical distribution)

Total article views: 461 (including HTML, PDF, and XML) Thereof 461 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 19 Sep 2024

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (4431 KB)
Metadata XML

Short summary

We present a summary of enabling high performance computing of CMAQ – a state-of-the-science regional-scale air quality model – on two popular cloud computing platforms, through documenting the technologies, model performance, scaling and relative merits. We anticipate that this may be a new paradigm for computationally intense future model applications in space and time. We initiated this work due to a growing need to leverage cloud computing advances and to ease learning curve for new users.


Total:	0
HTML:	0
PDF:	0
XML:	0