Preprints
https://doi.org/10.5194/egusphere-2022-781
https://doi.org/10.5194/egusphere-2022-781
 
22 Aug 2022
22 Aug 2022
Status: this preprint is open for discussion.

Parallelized Domain Decomposition for Multi-Dimensional Lagrangian Random Walk, Mass-Transfer Particle Tracking Schemes

Lucas Schauer1, Michael J. Schmidt2, Nicholas B. Engdahl3, Stephen D. Pankavich1, David A. Benson4, and Diogo Bolster5 Lucas Schauer et al.
  • 1Department of Applied Mathematics and Statistics, Colorado School of Mines, Golden, CO, 80401, USA
  • 2Center for Computing Research, Sandia National Laboratories, Albuquerque, NM 87185, USA
  • 3Department of Civil and Environmental Engineering, Washington State University, Pullman, WA, 99164, USA
  • 4Hydrologic Science and Engineering Program, Department of Geology and Geological Engineering, Colorado School of Mines, Golden, CO, 80401, USA
  • 5Department of Civil and Environmental Engineering and Earth Sciences, University of Notre Dame, Notre Dame, IN, 46556, USA

Abstract. Lagrangian particle tracking schemes allow a wide range of flow and transport processes to be simulated accurately, but a major challenge is numerically implementing the inter-particle interactions in an efficient manner. This article develops a multi-dimensional, parallelized domain decomposition (DDC) strategy for mass-transfer particle tracking (MTPT) methods in which particles exchange mass dynamically. We show that this can be efficiently parallelized by employing large numbers of CPU cores to accelerate run times. In order to validate the approach and our theoretical predictions we focus our efforts on a well known benchmark problem with pure diffusion, where analytical solutions in any number of dimensions are well established. In this work, we investigate different procedures for tiling the domain in two and three dimensions, (2-d and 3-d), as this type of formal DDC construction is currently limited to 1-d. An optimal tiling is prescribed based on physical problem parameters and the number of available CPU cores, as each tiling provides distinct results in both accuracy and run time. We further extend the most efficient technique to 3-d for comparison, leading to an analytical discussion of the effect of dimensionality on strategies for implementing DDC schemes. Increasing computational resources (cores) within the DDC method produces a trade-off between inter-node communication and on-node work. For an optimally subdivided diffusion problem, the 2-d parallelized algorithm achieves nearly perfect linear speedup in comparison with the serial run up to around 2700 cores, reducing a 5-hour simulation to 8 seconds, while the 3-d algorithm maintains appreciable speedup up to 1700 cores.

Lucas Schauer et al.

Status: open (until 17 Oct 2022)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2022-781', Anonymous Referee #1, 12 Sep 2022 reply
    • AC1: 'Reply to RC1', Lucas Schauer, 21 Sep 2022 reply
  • RC2: 'Comment on egusphere-2022-781', Anonymous Referee #2, 23 Sep 2022 reply

Lucas Schauer et al.

Lucas Schauer et al.

Viewed

Total article views: 279 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
217 52 10 279 1 2
  • HTML: 217
  • PDF: 52
  • XML: 10
  • Total: 279
  • BibTeX: 1
  • EndNote: 2
Views and downloads (calculated since 22 Aug 2022)
Cumulative views and downloads (calculated since 22 Aug 2022)

Viewed (geographical distribution)

Total article views: 265 (including HTML, PDF, and XML) Thereof 265 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 28 Sep 2022
Download
Short summary
We develop a multi-dimensional, parallelized domain decomposition strategy for mass-transfer particle tracking methods in two and three dimensions, investigate different procedures for decomposing the domain, and prescribe an optimal tiling based on physical problem parameters and the number of available CPU cores. For an optimally subdivided diffusion problem, the parallelized algorithm achieves nearly perfect linear speedup in comparison with the serial run up to thousands of cores.