Evaluating the Impact of Task Aggregation in Workflows with Shared Resource Environments: use case for the MONARCH application

Marciani, Manuel G.; Castrillo, Miguel; Utrera, Gladys; Acosta, Mario C.; Kinoshita, Bruno P.; Doblas-Reyes, Francisco

doi:10.5194/egusphere-2025-1104

Preprints

https://doi.org/10.5194/egusphere-2025-1104

Preprints

20 May 2025

| 20 May 2025

Evaluating the Impact of Task Aggregation in Workflows with Shared Resource Environments: use case for the MONARCH application

Manuel G. Marciani, Miguel Castrillo, Gladys Utrera, Mario C. Acosta, Bruno P. Kinoshita, and Francisco Doblas-Reyes

Abstract. High Performance Computing (HPC) is commonly employed to run high-impact Earth System Model (ESM) simulations, such as those for climate change. However, running workflows of ESM simulations on cutting-edge platforms can take long due to the congestion of the system and the lack of coordination between current HPC schedulers and workflow manager systems (WfMS). The Earth Sciences community has estimated the time in queue to be between 10 % to 20 % of the runtime in climate prediction experiments, the most time-consuming exercise. To address this issue, the developers of Autosubmit, a WfMS tailored for climate and air quality sciences, have developed wrappers to join multiple subsequent workflow tasks into a single submission. However, although wrappers are widely used in production for community models such as EC-Earth3, MONARCH, and Destination Earth simulations, to our knowledge, the benefits and potential drawbacks have never been rigorously evaluated. In addition, with portability in mind, the developers proposed to wrap depending on the entitlement of the user to the machine. In the widely utilized Slurm scheduler, this factor is called fair share. The objective of this paper is to quantify the impact of wrapping on queue time and understand its relationship with the fair share and the job's CPU and runtime request. To do this, we used a Slurm simulator to reproduce the behavior of the scheduler and, to recreate a representative usage of an HPC platform, we generated synthetic static workloads from data of the LUMI supercomputer and a dynamic workload from a past flagship HPC platform. As an example, we introduced jobs modeled after the MONARCH air quality application in these workloads, which we tracked their queue time. We found that, by simply joining tasks, the total runtime of the simulation reduces up to 7 %, and we have indications that this value is larger in reality. This saving translates to absolute terms in at least eight days less wasted in queue time for half of the simulations from the IS-ENES3 consortium of CMIP6 simulations. We also identified a high inverse correlation, of -0.87, between the queue time and the fair share factor.

Received: 07 Mar 2025 – Discussion started: 20 May 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 741 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (741 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

05 Dec 2025

Evaluating the impact of task aggregation in workflows with shared resource environments: use case for the MONARCH application

Manuel G. Marciani, Miguel Castrillo, Gladys Utrera, Mario C. Acosta, Bruno P. Kinoshita, and Francisco Doblas-Reyes

Geosci. Model Dev., 18, 9709–9721, https://doi.org/10.5194/gmd-18-9709-2025,https://doi.org/10.5194/gmd-18-9709-2025, 2025

Short summary

Manuel G. Marciani, Miguel Castrillo, Gladys Utrera, Mario C. Acosta, Bruno P. Kinoshita, and Francisco Doblas-Reyes

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-1104', Anonymous Referee #1, 21 May 2025

Publisher’s note: the content of this comment was removed on 26 May 2025 since the comment was posted by mistake.

Citation: https://doi.org/10.5194/egusphere-2025-1104-RC1
- AC1:
  'Reply on RC1', Manuel Giménez de Castro Marciani, 22 May 2025
  
  Dear reviewer,
  Thank you for your prompt answer. We are glad that you consider that our work "achieves significant improvements in processing efficiency," although you do not find this work fitting for the journal.
  First, we find it crucial to clarify that this work does not involve “technical implementation of running ArcGIS toolboxes,” nor does it uses “containerization,” nor it is oriented toward "parallel computing optimization.” You can verify in the document that these topics are not addressed in the paper. Instead, it tackles the queue time overhead that is a traversal issue across all models that run on shared HPC platforms. And we utilize the MONARCH chemical weather prediction system [1] as an example of a high-impact application that delivers forecasts operationally for the region of North Africa, the Middle East, and Europe [2] and is part of the Copernicus quality assessment ensemble [3].
  Regardless of whether this work directly contributes to the advancement of geoscientific models, we believe that the contribution of this paper has a strong impact across various models and studies that have been presented in this journal. The overhead caused by queue time is a cross-cutting issue that is highly relevant to the journal’s readership and to the broader community working with diverse scientific applications. Similarly, we have seen other transversal topics addressed in the journal that have been positively considered. In particular, this paper validates a novel method in the Earth Sciences domain by studying task aggregation and its interplay with the HPC scheduler on highly utilized machines, providing save estimates of up to 7% of the total runtime of the simulation. To put it in context, a paper published in this journal has reported 20% of overhead [4] in some platforms due to the time jobs spent in the queue. The models there considered where runs of the CMIP6 exercise, including — but not limited to — IFS [5], NEMO [6], ICON [7], and FESOM [8]. We could also highlight the current European flagship Destination Earth workflow [9], which is executed on three shared HPC platform, and thus is facing these large overheads from the queue time and task aggregation to mitigate them.
  Moreover, even though aggregation is being used, there is no work such as ours in the literature about how aggregating tasks impacts queue time. So, in this work, we make an effort to understanding how Slurm's factors and policies impact the time that a job remains in queue, which is of interest of everyone that utilizes HPC on a daily basis.
  And in our field, we have seen how HPC workflows for geoscientific models have gained renewed attention in fields such as climate change research, where decision-making relies on simulations that can take weeks to run on supercomputers. As a result, not only improving the throughput of the models but also optimizing the entire workflow within the digital continuum has become increasingly important.
  For these reasons, along with the references we have added, both from GMD and beyond, we believe there is a solid foundation to support the positive impact of our work on the community, and its relevance to the journal. We sincerely hope this will be taken into consideration during the review process.
  Sincerely,
  The authors.
  [1] https://doi.org/10.5194/gmd-14-6403-2021
  [2] https://dust.aemet.es/
  [3] https://regional-evaluation.atmosphere.copernicus.eu/pages/evaluation/?project=cams2-83&model=MONARCH#
  [4] https://doi.org/10.5194/gmd-17-3081-2024
  [5] https://doi.org/10.5194/gmd-11-3681-2018
  [6] https://doi.org/10.5194/gmd-15-1567-2022
  [7] https://doi.org/10.1002/qj.2378
  [8] https://doi.org/10.1007/s00382-014-2290-6
  [9] https://doi.org/10.1016/j.cliser.2023.100394
  
  Citation: https://doi.org/10.5194/egusphere-2025-1104-AC1
  - RC2: 'Sorry for uploading the wrong review comments. The original comments were for another manuscript, but below are for this manuscript.', Anonymous Referee #1, 23 May 2025
    
    This manuscript investigates the impact of task aggregation (i.e., wrapping multiple workflow tasks into a single submission) on job queue time in high-performance computing (HPC) environments. Using the MONARCH air quality model as a case study, the authors employ a Slurm simulator and synthetic workloads based on LUMI and historical HPC data to assess how task wrapping influences queue time and its correlation with factors such as fair share, CPU request, and runtime. The study finds that task aggregation can reduce total runtime by up to 7% and shows a strong negative correlation (−0.87) between queue time and fair share. Despite the relevance of the topic to HPC workflow optimization, this manuscript suffers from several major deficiencies that preclude publication in its current form. First, the introduction is poorly structured and lacks a coherent logical flow, making it difficult to understand the motivation and novelty of the study. The authors are strongly advised to thoroughly revise the introduction, clearly stating the research gap, objectives, and context within existing literature. Second, the overall structure of the manuscript is not conducive to clarity. It is recommended that the manuscript be reorganized into five standard sections: Introduction, Data and Methods, Results and Analysis, Discussion, and Conclusion. Currently, the paper lacks a meaningful discussion section, which is essential for interpreting results, evaluating strengths and weaknesses, and situating the work in a broader scientific context. Third, the content is overly simplistic, with limited methodological depth and superficial analysis, which significantly reduces the academic value of the paper. The authors focus primarily on a technical implementation without formulating or addressing a well-defined scientific problem. Furthermore, the figures and quantitative results, while potentially useful in a production context, do not provide sufficient insight or generalizability for a scientific audience. Lastly, the lack of rigorous validation or real-world deployment results further weakens the credibility of the conclusions. In summary, the manuscript lacks scientific depth, a clear problem formulation, and a meaningful discussion of results. Substantial revisions are needed to improve the structure, expand the analytical depth, and provide a more comprehensive evaluation of the methodology and its implications. Based on these significant shortcomings, I recommend rejection of this manuscript.
    
    Citation: https://doi.org/10.5194/egusphere-2025-1104-RC2
    
    AC3: 'Reply on RC2', Manuel Giménez de Castro Marciani, 11 Jun 2025
    
    We answer these reviewer's comments on the RC3 thread.
    
    Citation: https://doi.org/10.5194/egusphere-2025-1104-AC3
RC3:
'Comment on egusphere-2025-1104', Anonymous Referee #1, 26 May 2025

This manuscript investigates the impact of task aggregation (i.e., wrapping multiple workflow tasks into a single submission) on job queue time in high-performance computing (HPC) environments. Using the MONARCH air quality model as a case study, the authors employ a Slurm simulator and synthetic workloads based on LUMI and historical HPC data to assess how task wrapping influences queue time and its correlation with factors such as fair share, CPU request, and runtime. The study finds that task aggregation can reduce total runtime by up to 7% and shows a strong negative correlation (−0.87) between queue time and fair share. Despite the relevance of the topic to HPC workflow optimization, this manuscript suffers from several major deficiencies that preclude publication in its current form. First, the introduction is poorly structured and lacks a coherent logical flow, making it difficult to understand the motivation and novelty of the study. The authors are strongly advised to thoroughly revise the introduction, clearly stating the research gap, objectives, and context within existing literature. Second, the overall structure of the manuscript is not conducive to clarity. It is recommended that the manuscript be reorganized into five standard sections: Introduction, Data and Methods, Results and Analysis, Discussion, and Conclusion. Currently, the paper lacks a meaningful discussion section, which is essential for interpreting results, evaluating strengths and weaknesses, and situating the work in a broader scientific context. Third, the content is overly simplistic, with limited methodological depth and superficial analysis, which significantly reduces the academic value of the paper. The authors focus primarily on a technical implementation without formulating or addressing a well-defined scientific problem. Furthermore, the figures and quantitative results, while potentially useful in a production context, do not provide sufficient insight or generalizability for a scientific audience. Lastly, the lack of rigorous validation or real-world deployment results further weakens the credibility of the conclusions. In summary, the manuscript lacks scientific depth, a clear problem formulation, and a meaningful discussion of results. Substantial revisions are needed to improve the structure, expand the analytical depth, and provide a more comprehensive evaluation of the methodology and its implications. Based on these significant shortcomings, I recommend rejection of this manuscript.

Citation: https://doi.org/10.5194/egusphere-2025-1104-RC3
- AC2: 'Reply on RC3', Manuel Giménez de Castro Marciani, 11 Jun 2025
  
  We would like to thank the reviewer for their prompt review. We address all of the comments below.
  We did not understand the reviewer's recommendation for the manuscript to “be reorganized into five standard sections: Introduction, Data and Methods, Results and Analysis, Discussion, and Conclusion.” We adopted a structure identical to the one recommended, with the only addition being the background section, which explains the fundamental relationship between scheduler factors and time in queue.
  With regard to the “poorly structured” introduction not “clearly stating the research gap, objectives, and context within existing literature,” we believe that all of the reviewer’s concerns are addressed in the introduction. The research gap is stated on line 30, where we mention that “there has been a growing awareness of considering the entire execution of the workflow, taking into account not only the runtime of the most demanding part of it, but also the time spent queuing for resources and post-processing, with possible failures.” Our objective is in lines 74-75 and also in the second to last paragraph of the introduction, where we say that “Our results help to advance the understanding, from the user side, on how to optimize the submission in order to reduce the total queue time of their workflows.” Regarding the context within the literature, we state in lines 52–57 of the introduction that aggregation was identified elsewhere in the weather and climate community and that, as far as we know, there is no other work that tries to validate its usage.
  As for the lack of a meaningful discussion section, “evaluating strengths and weaknesses, and situating the work in a broader scientific context.” We believe we do address these points. For example, lines 268-269 draw attention to the relationship between a low fair share factor and aggregation improvement. We also reflect on the possible shortcomings of our methodology in lines 271-272, explaining that we rely on data from an old system that was not always as congested as current flagship systems. We also discuss the negative role of the backfill algorithm in lines 274-275. As for the broader scientific context, we remark — again — that this work is novel in the analysis within our context, as far as we know.
  With regard to the reviewer’s comment about the content being “overly simplistic, with limited methodological depth and superficial analysis,” we would like to point out — again — that aggregation is used across various fields, including climate and weather, materials sciences (Aiida with HyperQueue [1]) and bioinformatics (Snakemake with grouping [2]). This work is therefore novel in its evaluation of this technique for solving the queue issue, which has never been tackled head-on in the literature. Therefore, we did it in the most direct and straight forward way.
  As for our figures and quantitative results not providing “sufficient insight or generalizability for a scientific audience,” we believe we were sufficiently general to cover modern HPC centers, given the available data, using the two distinct experiments. As stated in lines 301–303, “To have both modern job requests and realistic behavior on the usage of the machines, we performed two experiment types.”
  Finally, with regard to “the lack of rigorous validation or real-world deployment,” we agree that real-world deployments would enrich our argument, executing them would require running multiple expensive concurrent simulations to test aggregation. Additionally, as Acosta et al. [3] have shown, the time in queue depends heavily on the specific platform. Therefore, we would also need to span this experimentation across sites. In conclusion, although we understand the request, we believe that real-world deployments are neither trivial nor cheap to run.
  We value all reviews and comments, as we always strive to ensure that our science is as rigorous and accurate as possible. Therefore, we would now prefer to wait for the remaining reviews before deciding how to proceed. Thank you.
  [1] https://aiida-hyperqueue.readthedocs.io/en/latest/
  [2] https://snakemake.readthedocs.io/en/stable/executing/grouping.html
  [3] https://gmd.copernicus.org/articles/17/3081/2024/
  
  Citation: https://doi.org/10.5194/egusphere-2025-1104-AC2
RC4:
'Comment on egusphere-2025-1104', Anonymous Referee #2, 17 Jun 2025

The core objective of this study is to quantify the impact of task aggregation on job queue times and to understand its relationship with the fair share factor, CPU, and runtime requests in HPC environments. The authors utilized a Slurm simulator, designed from actual Slurm executables, to reproduce scheduler behavior. They conducted experiments using both synthetic static workloads derived from LUMI supercomputer data and a dynamic workload from the decommissioned Curie machine to represent typical HPC usage patterns. The key findings indicate that aggregating tasks can reduce the total workflow runtime by up to 7% , which translates to substantial time savings (e.g., over eight days for half of the IS-ENES3 CMIP6 simulations). Furthermore, the study identified a strong inverse correlation (-0.87) between queue time and the fair share factor.
The quantitative results are compelling and offer valuable insights for optimizing workflow submissions on congested HPC systems. However, there are still significant issues. I agree with another reviewer that the submission is much like a technical report, instead of a scientific paper. Further more, several technical problems should be addressed.
(1) The manuscript states that the observed 7% reduction in runtime could be "larger in reality". Please expand on the specific real-world factors or complexities (e.g., more dynamic system loads, nuanced fair share policies, or varied backfill algorithm effectiveness) that might contribute to a greater benefit in practice. This would enhance the practical applicability and persuasiveness of the findings.
(2) While the Slurm simulator is a strength, a more explicit discussion of its known limitations and how these might influence the generalizability of the results is warranted. For instance, the paper mentions that the simulator "does not have support for dynamic submission times" for constrained jobs as a real Workflow Management System like Autosubmit would. While the authors address this by calculating submission times based on assumed predecessor completion, further detail on the potential implications of this approximation on the reported queue times would be beneficial.
(3) The methodology for controlling fair share in static workloads using "dummy" jobs is clear. However, consider adding a brief discussion on whether this method fully captures the complex and dynamic evolution of fair share in a truly live, highly utilized HPC system.
(4) The paper outlines several categories of wrappers (vertical, horizontal, vertical-horizontal, and horizontal-vertical) but focuses solely on vertical wrappers. A brief justification for this specific focus, and perhaps a suggestion for future research avenues exploring the impact of the other wrapper types, would strengthen the introduction or discussion.

Citation: https://doi.org/10.5194/egusphere-2025-1104-RC4
- AC4: 'Reply on RC4', Manuel Giménez de Castro Marciani, 06 Aug 2025
  
  Please find the answers from the authors in the attached document.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1104-AC4
RC5:
'Comment on egusphere-2025-1104', Anonymous Referee #3, 19 Jun 2025

This paper discusses an interesting use of a SLURM simulator to investigate the use of pilot jobs (which they call using wrappers) to improve the actual throughput of some small air quality simulations. The results obtained suggest an improvement in throughput of about 7%, and identified a high inverse correlation between the queue time and the fair share factor (a key slurm parameter).
Unfortunately the paper has some flaws, in particular around putting itself in context with other work, and on the wider applicability of the results, and until these are rectified, I do not think it should appear in GMD. I recommend major revisions.
I have three significant concerns:

1. The material is not put well in the context of prior work, not only on measurement (e.g. Balaji et al 2017, whose results differ from those presented here - my quick calculation from their table 2 suggests the problem is of order 30%, so I don't know where the 10-20% they use as their motivation came from), and the large literature on pilot jobs on distributed and high throughput computing. The concept of wrapping things together is quite well established, but that leads me to my second concern:

2. One of the reasons that this is not done substantially in large-scale climate simulation is that the "chunks" of simulation (their nomenclature) provide natural checkpointing, and with bigger jobs than those they discuss model and hardware failures do lead to the need to checkpoint. Hardware failures are more prevalent now, and with bigger jobs that can be problematic. Perhaps this is why they only did Vertical Wrapping (they are only using one node per job if they only need 96 cores). Possibly horizontal wrapping would be more subject to this issue, but it might also be possible for the workflow manager to cope.

3. The use of MONARCH and 96 core jobs (which are small for these systems) means that these results might not be typical of bigger jobs, not least because most large systems also prioritise bigger jobs, and so the fair-share factor influence may be less important. There is a large problem space that has not been examined. It's certainly the case that the CMIP extrapolation cannot be substantiated without exploring this in their simulations.
This last concern meant that I didn't read through their methodology in great detail, as I think this confounding factor for wider applicability needs to be addressed first.
Minor concerns/notes:
1. There is a language (ASYPD, SYPD) for the difference between the overall throughput and the peak throughput that was introduced in Balaji et al (2017) - a paper that shares a co-author with this one, so it is surprising that language is not used, and that Balaji et al is not cited.

2. I had not seen Abhinit etal 2022, so I looked at it. I do not think it is saying the same thing as stated here. The problem in climate is unlikely to reach a need to wrap more than dozens of tasks (any more and the checkpoint issue dominates), whereas Abhinit et all were looking at wrapping thousands of tasks - in their case the issue is that most SLURM configurations do not have enough memory or resources to deal with the look ahead for queues with thousands of tasks. (Those that do are typically configured for High Throughput Computing, which is a different configuration to those encountered in most HPC sites where climate simulation is undertaken.) Abhinit et al's discussion is more relevant to Dask workflows than simulation workflows. That said, it is indeed the case that HPC sites often limit the number of jobs users can have in queues, which is why tools like Autosubmit and Cylc exist. The issue of number of jobs is not the same as the issue of the queuing time for those jobs.

3. The decision to use pilot jobs for "wrappers" is not surprising, as pilot jobs have a long history, and a significant literature, none of which is referenced here.

4. 7% is interesting, but they then say this corresponds to 8 days of their CMIP project, which means that they were running for 3-4 months. Saving eight days sounds less impressive in that context, and surely suggests on that timescale background workload would influence things by at least the same factor (that is our experience). The use of a short selected trace means this longer-term variability is not sampled, but the more important issue is the influence of JobSizeWeight and JobSizeFactor.

5. It is a pity that the discussion of horizontal wrappers was not followed through as that is likely to result in better throughput for ensembles where there is any risk of a failure during execution.

6. The linear correlation exposed probably comes directly from the equation used by SLURM - surely they should include that equation and discuss the influence of all the key factors and relate to their results?

Citation: https://doi.org/10.5194/egusphere-2025-1104-RC5
- AC5: 'Reply on RC5', Manuel Giménez de Castro Marciani, 06 Aug 2025
  
  Please find the answers from the authors in the attached document.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1104-AC5

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-1104', Anonymous Referee #1, 21 May 2025

Publisher’s note: the content of this comment was removed on 26 May 2025 since the comment was posted by mistake.

Citation: https://doi.org/10.5194/egusphere-2025-1104-RC1
- AC1:
  'Reply on RC1', Manuel Giménez de Castro Marciani, 22 May 2025
  
  Dear reviewer,
  Thank you for your prompt answer. We are glad that you consider that our work "achieves significant improvements in processing efficiency," although you do not find this work fitting for the journal.
  First, we find it crucial to clarify that this work does not involve “technical implementation of running ArcGIS toolboxes,” nor does it uses “containerization,” nor it is oriented toward "parallel computing optimization.” You can verify in the document that these topics are not addressed in the paper. Instead, it tackles the queue time overhead that is a traversal issue across all models that run on shared HPC platforms. And we utilize the MONARCH chemical weather prediction system [1] as an example of a high-impact application that delivers forecasts operationally for the region of North Africa, the Middle East, and Europe [2] and is part of the Copernicus quality assessment ensemble [3].
  Regardless of whether this work directly contributes to the advancement of geoscientific models, we believe that the contribution of this paper has a strong impact across various models and studies that have been presented in this journal. The overhead caused by queue time is a cross-cutting issue that is highly relevant to the journal’s readership and to the broader community working with diverse scientific applications. Similarly, we have seen other transversal topics addressed in the journal that have been positively considered. In particular, this paper validates a novel method in the Earth Sciences domain by studying task aggregation and its interplay with the HPC scheduler on highly utilized machines, providing save estimates of up to 7% of the total runtime of the simulation. To put it in context, a paper published in this journal has reported 20% of overhead [4] in some platforms due to the time jobs spent in the queue. The models there considered where runs of the CMIP6 exercise, including — but not limited to — IFS [5], NEMO [6], ICON [7], and FESOM [8]. We could also highlight the current European flagship Destination Earth workflow [9], which is executed on three shared HPC platform, and thus is facing these large overheads from the queue time and task aggregation to mitigate them.
  Moreover, even though aggregation is being used, there is no work such as ours in the literature about how aggregating tasks impacts queue time. So, in this work, we make an effort to understanding how Slurm's factors and policies impact the time that a job remains in queue, which is of interest of everyone that utilizes HPC on a daily basis.
  And in our field, we have seen how HPC workflows for geoscientific models have gained renewed attention in fields such as climate change research, where decision-making relies on simulations that can take weeks to run on supercomputers. As a result, not only improving the throughput of the models but also optimizing the entire workflow within the digital continuum has become increasingly important.
  For these reasons, along with the references we have added, both from GMD and beyond, we believe there is a solid foundation to support the positive impact of our work on the community, and its relevance to the journal. We sincerely hope this will be taken into consideration during the review process.
  Sincerely,
  The authors.
  [1] https://doi.org/10.5194/gmd-14-6403-2021
  [2] https://dust.aemet.es/
  [3] https://regional-evaluation.atmosphere.copernicus.eu/pages/evaluation/?project=cams2-83&model=MONARCH#
  [4] https://doi.org/10.5194/gmd-17-3081-2024
  [5] https://doi.org/10.5194/gmd-11-3681-2018
  [6] https://doi.org/10.5194/gmd-15-1567-2022
  [7] https://doi.org/10.1002/qj.2378
  [8] https://doi.org/10.1007/s00382-014-2290-6
  [9] https://doi.org/10.1016/j.cliser.2023.100394
  
  Citation: https://doi.org/10.5194/egusphere-2025-1104-AC1
  - RC2: 'Sorry for uploading the wrong review comments. The original comments were for another manuscript, but below are for this manuscript.', Anonymous Referee #1, 23 May 2025
    
    This manuscript investigates the impact of task aggregation (i.e., wrapping multiple workflow tasks into a single submission) on job queue time in high-performance computing (HPC) environments. Using the MONARCH air quality model as a case study, the authors employ a Slurm simulator and synthetic workloads based on LUMI and historical HPC data to assess how task wrapping influences queue time and its correlation with factors such as fair share, CPU request, and runtime. The study finds that task aggregation can reduce total runtime by up to 7% and shows a strong negative correlation (−0.87) between queue time and fair share. Despite the relevance of the topic to HPC workflow optimization, this manuscript suffers from several major deficiencies that preclude publication in its current form. First, the introduction is poorly structured and lacks a coherent logical flow, making it difficult to understand the motivation and novelty of the study. The authors are strongly advised to thoroughly revise the introduction, clearly stating the research gap, objectives, and context within existing literature. Second, the overall structure of the manuscript is not conducive to clarity. It is recommended that the manuscript be reorganized into five standard sections: Introduction, Data and Methods, Results and Analysis, Discussion, and Conclusion. Currently, the paper lacks a meaningful discussion section, which is essential for interpreting results, evaluating strengths and weaknesses, and situating the work in a broader scientific context. Third, the content is overly simplistic, with limited methodological depth and superficial analysis, which significantly reduces the academic value of the paper. The authors focus primarily on a technical implementation without formulating or addressing a well-defined scientific problem. Furthermore, the figures and quantitative results, while potentially useful in a production context, do not provide sufficient insight or generalizability for a scientific audience. Lastly, the lack of rigorous validation or real-world deployment results further weakens the credibility of the conclusions. In summary, the manuscript lacks scientific depth, a clear problem formulation, and a meaningful discussion of results. Substantial revisions are needed to improve the structure, expand the analytical depth, and provide a more comprehensive evaluation of the methodology and its implications. Based on these significant shortcomings, I recommend rejection of this manuscript.
    
    Citation: https://doi.org/10.5194/egusphere-2025-1104-RC2
    
    AC3: 'Reply on RC2', Manuel Giménez de Castro Marciani, 11 Jun 2025
    
    We answer these reviewer's comments on the RC3 thread.
    
    Citation: https://doi.org/10.5194/egusphere-2025-1104-AC3
RC3:
'Comment on egusphere-2025-1104', Anonymous Referee #1, 26 May 2025

This manuscript investigates the impact of task aggregation (i.e., wrapping multiple workflow tasks into a single submission) on job queue time in high-performance computing (HPC) environments. Using the MONARCH air quality model as a case study, the authors employ a Slurm simulator and synthetic workloads based on LUMI and historical HPC data to assess how task wrapping influences queue time and its correlation with factors such as fair share, CPU request, and runtime. The study finds that task aggregation can reduce total runtime by up to 7% and shows a strong negative correlation (−0.87) between queue time and fair share. Despite the relevance of the topic to HPC workflow optimization, this manuscript suffers from several major deficiencies that preclude publication in its current form. First, the introduction is poorly structured and lacks a coherent logical flow, making it difficult to understand the motivation and novelty of the study. The authors are strongly advised to thoroughly revise the introduction, clearly stating the research gap, objectives, and context within existing literature. Second, the overall structure of the manuscript is not conducive to clarity. It is recommended that the manuscript be reorganized into five standard sections: Introduction, Data and Methods, Results and Analysis, Discussion, and Conclusion. Currently, the paper lacks a meaningful discussion section, which is essential for interpreting results, evaluating strengths and weaknesses, and situating the work in a broader scientific context. Third, the content is overly simplistic, with limited methodological depth and superficial analysis, which significantly reduces the academic value of the paper. The authors focus primarily on a technical implementation without formulating or addressing a well-defined scientific problem. Furthermore, the figures and quantitative results, while potentially useful in a production context, do not provide sufficient insight or generalizability for a scientific audience. Lastly, the lack of rigorous validation or real-world deployment results further weakens the credibility of the conclusions. In summary, the manuscript lacks scientific depth, a clear problem formulation, and a meaningful discussion of results. Substantial revisions are needed to improve the structure, expand the analytical depth, and provide a more comprehensive evaluation of the methodology and its implications. Based on these significant shortcomings, I recommend rejection of this manuscript.

Citation: https://doi.org/10.5194/egusphere-2025-1104-RC3
- AC2: 'Reply on RC3', Manuel Giménez de Castro Marciani, 11 Jun 2025
  
  We would like to thank the reviewer for their prompt review. We address all of the comments below.
  We did not understand the reviewer's recommendation for the manuscript to “be reorganized into five standard sections: Introduction, Data and Methods, Results and Analysis, Discussion, and Conclusion.” We adopted a structure identical to the one recommended, with the only addition being the background section, which explains the fundamental relationship between scheduler factors and time in queue.
  With regard to the “poorly structured” introduction not “clearly stating the research gap, objectives, and context within existing literature,” we believe that all of the reviewer’s concerns are addressed in the introduction. The research gap is stated on line 30, where we mention that “there has been a growing awareness of considering the entire execution of the workflow, taking into account not only the runtime of the most demanding part of it, but also the time spent queuing for resources and post-processing, with possible failures.” Our objective is in lines 74-75 and also in the second to last paragraph of the introduction, where we say that “Our results help to advance the understanding, from the user side, on how to optimize the submission in order to reduce the total queue time of their workflows.” Regarding the context within the literature, we state in lines 52–57 of the introduction that aggregation was identified elsewhere in the weather and climate community and that, as far as we know, there is no other work that tries to validate its usage.
  As for the lack of a meaningful discussion section, “evaluating strengths and weaknesses, and situating the work in a broader scientific context.” We believe we do address these points. For example, lines 268-269 draw attention to the relationship between a low fair share factor and aggregation improvement. We also reflect on the possible shortcomings of our methodology in lines 271-272, explaining that we rely on data from an old system that was not always as congested as current flagship systems. We also discuss the negative role of the backfill algorithm in lines 274-275. As for the broader scientific context, we remark — again — that this work is novel in the analysis within our context, as far as we know.
  With regard to the reviewer’s comment about the content being “overly simplistic, with limited methodological depth and superficial analysis,” we would like to point out — again — that aggregation is used across various fields, including climate and weather, materials sciences (Aiida with HyperQueue [1]) and bioinformatics (Snakemake with grouping [2]). This work is therefore novel in its evaluation of this technique for solving the queue issue, which has never been tackled head-on in the literature. Therefore, we did it in the most direct and straight forward way.
  As for our figures and quantitative results not providing “sufficient insight or generalizability for a scientific audience,” we believe we were sufficiently general to cover modern HPC centers, given the available data, using the two distinct experiments. As stated in lines 301–303, “To have both modern job requests and realistic behavior on the usage of the machines, we performed two experiment types.”
  Finally, with regard to “the lack of rigorous validation or real-world deployment,” we agree that real-world deployments would enrich our argument, executing them would require running multiple expensive concurrent simulations to test aggregation. Additionally, as Acosta et al. [3] have shown, the time in queue depends heavily on the specific platform. Therefore, we would also need to span this experimentation across sites. In conclusion, although we understand the request, we believe that real-world deployments are neither trivial nor cheap to run.
  We value all reviews and comments, as we always strive to ensure that our science is as rigorous and accurate as possible. Therefore, we would now prefer to wait for the remaining reviews before deciding how to proceed. Thank you.
  [1] https://aiida-hyperqueue.readthedocs.io/en/latest/
  [2] https://snakemake.readthedocs.io/en/stable/executing/grouping.html
  [3] https://gmd.copernicus.org/articles/17/3081/2024/
  
  Citation: https://doi.org/10.5194/egusphere-2025-1104-AC2
RC4:
'Comment on egusphere-2025-1104', Anonymous Referee #2, 17 Jun 2025

The core objective of this study is to quantify the impact of task aggregation on job queue times and to understand its relationship with the fair share factor, CPU, and runtime requests in HPC environments. The authors utilized a Slurm simulator, designed from actual Slurm executables, to reproduce scheduler behavior. They conducted experiments using both synthetic static workloads derived from LUMI supercomputer data and a dynamic workload from the decommissioned Curie machine to represent typical HPC usage patterns. The key findings indicate that aggregating tasks can reduce the total workflow runtime by up to 7% , which translates to substantial time savings (e.g., over eight days for half of the IS-ENES3 CMIP6 simulations). Furthermore, the study identified a strong inverse correlation (-0.87) between queue time and the fair share factor.
The quantitative results are compelling and offer valuable insights for optimizing workflow submissions on congested HPC systems. However, there are still significant issues. I agree with another reviewer that the submission is much like a technical report, instead of a scientific paper. Further more, several technical problems should be addressed.
(1) The manuscript states that the observed 7% reduction in runtime could be "larger in reality". Please expand on the specific real-world factors or complexities (e.g., more dynamic system loads, nuanced fair share policies, or varied backfill algorithm effectiveness) that might contribute to a greater benefit in practice. This would enhance the practical applicability and persuasiveness of the findings.
(2) While the Slurm simulator is a strength, a more explicit discussion of its known limitations and how these might influence the generalizability of the results is warranted. For instance, the paper mentions that the simulator "does not have support for dynamic submission times" for constrained jobs as a real Workflow Management System like Autosubmit would. While the authors address this by calculating submission times based on assumed predecessor completion, further detail on the potential implications of this approximation on the reported queue times would be beneficial.
(3) The methodology for controlling fair share in static workloads using "dummy" jobs is clear. However, consider adding a brief discussion on whether this method fully captures the complex and dynamic evolution of fair share in a truly live, highly utilized HPC system.
(4) The paper outlines several categories of wrappers (vertical, horizontal, vertical-horizontal, and horizontal-vertical) but focuses solely on vertical wrappers. A brief justification for this specific focus, and perhaps a suggestion for future research avenues exploring the impact of the other wrapper types, would strengthen the introduction or discussion.

Citation: https://doi.org/10.5194/egusphere-2025-1104-RC4
- AC4: 'Reply on RC4', Manuel Giménez de Castro Marciani, 06 Aug 2025
  
  Please find the answers from the authors in the attached document.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1104-AC4
RC5:
'Comment on egusphere-2025-1104', Anonymous Referee #3, 19 Jun 2025

This paper discusses an interesting use of a SLURM simulator to investigate the use of pilot jobs (which they call using wrappers) to improve the actual throughput of some small air quality simulations. The results obtained suggest an improvement in throughput of about 7%, and identified a high inverse correlation between the queue time and the fair share factor (a key slurm parameter).
Unfortunately the paper has some flaws, in particular around putting itself in context with other work, and on the wider applicability of the results, and until these are rectified, I do not think it should appear in GMD. I recommend major revisions.
I have three significant concerns:

1. The material is not put well in the context of prior work, not only on measurement (e.g. Balaji et al 2017, whose results differ from those presented here - my quick calculation from their table 2 suggests the problem is of order 30%, so I don't know where the 10-20% they use as their motivation came from), and the large literature on pilot jobs on distributed and high throughput computing. The concept of wrapping things together is quite well established, but that leads me to my second concern:

2. One of the reasons that this is not done substantially in large-scale climate simulation is that the "chunks" of simulation (their nomenclature) provide natural checkpointing, and with bigger jobs than those they discuss model and hardware failures do lead to the need to checkpoint. Hardware failures are more prevalent now, and with bigger jobs that can be problematic. Perhaps this is why they only did Vertical Wrapping (they are only using one node per job if they only need 96 cores). Possibly horizontal wrapping would be more subject to this issue, but it might also be possible for the workflow manager to cope.

3. The use of MONARCH and 96 core jobs (which are small for these systems) means that these results might not be typical of bigger jobs, not least because most large systems also prioritise bigger jobs, and so the fair-share factor influence may be less important. There is a large problem space that has not been examined. It's certainly the case that the CMIP extrapolation cannot be substantiated without exploring this in their simulations.
This last concern meant that I didn't read through their methodology in great detail, as I think this confounding factor for wider applicability needs to be addressed first.
Minor concerns/notes:
1. There is a language (ASYPD, SYPD) for the difference between the overall throughput and the peak throughput that was introduced in Balaji et al (2017) - a paper that shares a co-author with this one, so it is surprising that language is not used, and that Balaji et al is not cited.

2. I had not seen Abhinit etal 2022, so I looked at it. I do not think it is saying the same thing as stated here. The problem in climate is unlikely to reach a need to wrap more than dozens of tasks (any more and the checkpoint issue dominates), whereas Abhinit et all were looking at wrapping thousands of tasks - in their case the issue is that most SLURM configurations do not have enough memory or resources to deal with the look ahead for queues with thousands of tasks. (Those that do are typically configured for High Throughput Computing, which is a different configuration to those encountered in most HPC sites where climate simulation is undertaken.) Abhinit et al's discussion is more relevant to Dask workflows than simulation workflows. That said, it is indeed the case that HPC sites often limit the number of jobs users can have in queues, which is why tools like Autosubmit and Cylc exist. The issue of number of jobs is not the same as the issue of the queuing time for those jobs.

3. The decision to use pilot jobs for "wrappers" is not surprising, as pilot jobs have a long history, and a significant literature, none of which is referenced here.

4. 7% is interesting, but they then say this corresponds to 8 days of their CMIP project, which means that they were running for 3-4 months. Saving eight days sounds less impressive in that context, and surely suggests on that timescale background workload would influence things by at least the same factor (that is our experience). The use of a short selected trace means this longer-term variability is not sampled, but the more important issue is the influence of JobSizeWeight and JobSizeFactor.

5. It is a pity that the discussion of horizontal wrappers was not followed through as that is likely to result in better throughput for ensembles where there is any risk of a failure during execution.

6. The linear correlation exposed probably comes directly from the equation used by SLURM - surely they should include that equation and discuss the influence of all the key factors and relate to their results?

Citation: https://doi.org/10.5194/egusphere-2025-1104-RC5
- AC5: 'Reply on RC5', Manuel Giménez de Castro Marciani, 06 Aug 2025
  
  Please find the answers from the authors in the attached document.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1104-AC5

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

AR by Manuel Giménez de Castro Marciani on behalf of the Authors (29 Aug 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (09 Oct 2025) by Le Yu

RR by Anonymous Referee #2 (03 Nov 2025)

ED: Publish as is (03 Nov 2025) by Le Yu

AR by Manuel Giménez de Castro Marciani on behalf of the Authors (05 Nov 2025)

Journal article(s) based on this preprint

05 Dec 2025

Evaluating the impact of task aggregation in workflows with shared resource environments: use case for the MONARCH application

Manuel G. Marciani, Miguel Castrillo, Gladys Utrera, Mario C. Acosta, Bruno P. Kinoshita, and Francisco Doblas-Reyes

Geosci. Model Dev., 18, 9709–9721, https://doi.org/10.5194/gmd-18-9709-2025,https://doi.org/10.5194/gmd-18-9709-2025, 2025

Short summary

Manuel G. Marciani, Miguel Castrillo, Gladys Utrera, Mario C. Acosta, Bruno P. Kinoshita, and Francisco Doblas-Reyes

Data sets

Wrapper Impact Workloads and BSC Slurm Simulator Output of Dynamic Traces from CEA Curie Manuel G. Marciani https://doi.org/10.5281/zenodo.10623439

Full Results from Simulations for Static and Dynamic Workloads Using BSC Slurm Simulator Manuel G. Marciani https://doi.org/10.5281/zenodo.10818813

Wrapper Impact Workloads and BSC Slurm Simulator Output of Static Traces based on Data from LUMI Supercomputer Manuel G. Marciani https://doi.org/10.5281/zenodo.10624403

Interactive computing environment

Static Workload Results Analysis Scripts Manuel G. Marciani https://doi.org/10.5281/zenodo.12801377

Scripts and Files to Add Workflow to Curie Manuel G. Marciani https://doi.org/10.5281/zenodo.12801281

Docker Image of the Computational Earth Sciences Slurm Simulator Manuel G. Marciani https://doi.org/10.5281/zenodo.12801138

Manuel G. Marciani, Miguel Castrillo, Gladys Utrera, Mario C. Acosta, Bruno P. Kinoshita, and Francisco Doblas-Reyes

Viewed

Total article views: 5,994 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
5,510	364	120	5,994	99	141

HTML: 5,510
PDF: 364
XML: 120
Total: 5,994
BibTeX: 99
EndNote: 141

Views and downloads (calculated since 20 May 2025)

Month	HTML	PDF	XML	Total
May 2025	198	46	16	260
Jun 2025	210	38	32	280
Jul 2025	88	22	12	122
Aug 2025	928	34	8	970
Sep 2025	3,492	8	4	3,504
Oct 2025	168	12	4	184
Nov 2025	66	24	8	98
Dec 2025	54	34	2	90
Jan 2026	76	20	18	114
Feb 2026	70	32	8	110
Mar 2026	96	46	4	146
Apr 2026	29	27	2	58
May 2026	32	14	2	48
Jun 2026	3	7	0	10
Jul 2026	0

Cumulative views and downloads (calculated since 20 May 2025)

Month	HTML	PDF	XML	Total
May 2025	198	46	16	260
Jun 2025	210	38	32	280
Jul 2025	88	22	12	122
Aug 2025	928	34	8	970
Sep 2025	3,492	8	4	3,504
Oct 2025	168	12	4	184
Nov 2025	66	24	8	98
Dec 2025	54	34	2	90
Jan 2026	76	20	18	114
Feb 2026	70	32	8	110
Mar 2026	96	46	4	146
Apr 2026	29	27	2	58
May 2026	32	14	2	48
Jun 2026	3	7	0	10
Jul 2026	0

Viewed (geographical distribution)

Total article views: 5,994 (including HTML, PDF, and XML) Thereof 5,994 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 14 Jul 2026

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (741 KB)
Metadata XML

Short summary

Earth System Model simulations are executed with workflows in congested HPC resources. These workflows could be made of thousands of tasks, which, if naively submitted to be executed, might add overheads due to queueing for resources. In this paper we explored a technique of aggregating tasks into a single submission. We related it to a key factor used by the software in charge of the scheduling. We find that this simple technique can reduce up to 7 % of the time spent in queue.


Total:	0
HTML:	0
PDF:	0
XML:	0