the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Towards standardising output datasets using the numerical obstacle-resolving model MITRAS as an example
Abstract. The publication of well-described FAIR datasets is an important part of atmospheric modelling and research. Data standards ensure that datasets are delivered in a consistent way that is easy to understand for a data user. Standards define how the data is described, i.e. which variable names, descriptions and data formats are used. However, existing model data standards such as the CF conventions are mainly adapted for global or regional scale models. For atmospheric micro-scale obstacle-resolving (urban) models (ORMs), there is no discipline-specific model data standard and the existing ones are not fully suitable to adequately describe ORM datasets. To overcome the lack of standardisation processes, the ATMODAT STANDARD has been developed to promote the publication of FAIR datasets when no discipline-specific standard is available. This paper describes the process of producing standardised model results. The processing for ORM MITRAS serve as an example to show possible ways for the publication of FAIR datasets. The adaptation of the model's post-processing routine M2CDF and the development of a new post-processing routine called NC2ATMODAT are shown. The last may be applicable by other ORM modellers, its limitations, challenges and further use cases are discussed. Application of the two post-processors allows the preparation of datasets according to the requirements of the CF convention and the ATMODAT STANDARD. The first standardised MITRAS datasets are successfully processed and published.
- Preprint
(9511 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 01 Jul 2026)
- CEC1: 'Comment on egusphere-2025-5521 - No compliance with the policy of the journal', Juan Antonio Añel, 07 Jan 2026 reply
-
RC1: 'Comment on egusphere-2025-5521', Anonymous Referee #1, 22 Jun 2026
reply
The manuscript addresses the publication of well-described, harmonized and reusable datasets from obstacle-resolving atmospheric models. I agree with the general motivation that FAIR data, standardized variable descriptions, and well-defined metadata are increasingly important for the urban micro-meteorology and obstacle-resolving modelling communities. The manuscript also provides a useful technical example of how MITRAS model output can be converted and enriched to comply with CF conventions and the ATMODAT standard.
The manuscript addresses an important issue and provides a useful example, but in my opinion it should be more clear on its scope, discuss the limitations of the proposed approach, and strengthen the connection between the presented tool and the broader problem of ORM data standardization. In particular, the manuscript would also profit from a discussion of model-developer efforts, computational efficiency, storage overhead, metadata strategy, generalisation to non-Cartesian geometries, and the need for a broader community-driven standardization process.I recommend the manuscript for publication after major revisions have been considered.
My major concerns and minor comments are outlined in the following.Major comments
--------------
Comment A) The introduction motivates the need for discipline-specific data standards for obstacle-resolving models, but the actual manuscript mainly describes how MITRAS output was adapted to fulfill CF/ATMODAT requirements. This is useful, but the wider relevance for the ORM community should be made more explicit. At the moment, the manuscript gives the impression that a MITRAS-specific post-processing workflow is presented as a possible solution to a broader community problem. In my view, the manuscript should more clearly distinguish between i) standardizing output for one model, ii) developing a reusable post-processing tool, iii) identifying missing CF standard names, and iv) defining a community-wide data standard for ORM datasets.The current work is strongest as a technical case study for MITRAS and as a starting point for a broader discussion. If the authors want to make a stronger claim towards ORM-wide standardization, the paper should better explain how this approach could be transferred to other models and what level of model-specific adaptation would be required.
Comment B) The paper correctly states that variable mapping between models is difficult. This point is central and should be developed further. A standard for obstacle-resolving models cannot realistically be derived from one model alone, and even comparison with PALM is not sufficient, because different models use different grids, numerical methods, surface representations, physical formulations, and I/O-concepts, etc.
A path towards ORM data standardization would likely require a coordinated effort involving representatives from several model developer communities. This is especially relevant because models such as PALM, DALES, OpenFOAM, FITNAH, ENVI-met, MISKAM, MITRAS, etc. may be used not only for urban climate, but also for wind engineering, dispersion, boundary-layer turbulence, plant canopy flows, microphysics or other applications. The manuscript should therefore discuss more explicitly how broad the proposed standardization effort should be and where the limits of a common standard should lie. For example, a useful addition would be a table or structured discussion identifying which parts of the proposed workflow are MITRAS-specific, which are generally applicable to Cartesian-grid models, and which could also be relevant for non-Cartesian or unstructured-grid models.
Comment C) The manuscript argues that standardization benefits data users, which is clear in my opinion. But it is less clear why model developers should invest substantial effort to harmonize model outputs. From a modelling perspective, adapting output to external standards can quickly become inconvenient, time consuming and costly in terms of I/O-performance. Internal model data structures are often optimized for computational performance, memory layout, and I/O efficiency, but not for direct publication. This tension should be discussed more explicitly.
Comment D) I agree that proper variable descriptions need to be more explicit. However, lengthy descriptions in netCDF attributes are not always an ideal solution. Long attributes can be inconvenient to read, difficult to maintain, and sometimes limited in length by tools. The manuscript should better discuss the role of external, versioned, machine-readable documentation in addition to netCDF metadata. This should e.g. include concise CF-compatible metadata in the netCDF files, a harmonized vocabulary for ORM-specific variables, detailed variable tables in the model documentation including its meaning, clear definitions of grid staggering, masks, surface orientation, and units, and finally information that describes how variables were computed or post-processed.
For model-specific variables, the exact numerical meaning may not be captured sufficiently by a short standard name or long name. In such cases, a stable external documentation reference may be more robust than very long attributes.Comment E) Limitations of the surface representation should be discussed more explicitly. The proposed representation of building surface variables using east, west, north, south, top and bottom surfaces is clear for Cartesian obstacles where buildings occupy grid cells in a binary way with an all-or-nothing approach. However, this representation appears to be specific to orthogonal Cartesian grids. To give a broader picture, the manuscript should also discuss what happens for more general geometry representations, for example unstructured grids or cut-cell approaches where buildings occupy only part of a grid cell and surfaces need to be represented by surface normal vectors rather than discrete east/west/north/south labels. In such cases, surface orientation cannot be fully described by six discrete directions. A more general ORM standard may require additional variables such as surface area, surface normal vector components, fractional cell occupancy, and possibly surface type. The current approach is useful for MITRAS and similar Cartesian-grid models, but its generality should not be overstated.
Minor comments
--------------
Lines 107–109: I agree with the conclusion that variable mapping between models is not trivial. This point should be more emphasized, because it is one of the central arguments for a broader community effort.Section 3.2: The section contains many very short subsections. The readability could be improved by merging some of them or presenting the nc2atmodat-workflow as a structured list or table.
Around line 185: Please explain more explicitly why assigning surface variables to cell centers would misrepresent building-boundary positions.
I understand the intended meaning, but this should be explained more clearly for readers who are not familiar with staggered Cartesian grids.
The authors could add a short explanation that a surface variable is physically located at the interface between an atmospheric cell and an obstacle cell, not at the centre of either cell. Assigning it to the cell center would shift the apparent surface location by half a grid spacing and could therefore misrepresent the geometry, especially in visualization or post-processing.Around line 235: Please justify why latitude/longitude coordinates are needed for each grid point, rather than only a projected coordinate system such as UTM with proper georeferencing metadata. I find the usage of lat/lon somewhat unintuitive for very small Cartesian model domains with domain sizes of only a few kilometers. In many cases, a projected Cartesian coordinate system such as UTM together with a proper grid mapping and reference point, should be sufficient. GIS software is generally well equipped to transform projected coordinates to lat/lon if needed.
Fig. 6: Just a remark: the PM10 example is useful, but it raises a broader question. How should different passive and reactive tracers be harmonized across ORM datasets? For example, many models may cover PM10 or PM2.5, so these can be easily harmonized. But what happens when further chemical species including precursor substances, BVOCs, pollen, other scalars, age tracers, or purely artificial passive tracers are added. How can these be harmonized in terms of variable names, meaning, etc.? These may differ in units, interpretation, emission source, or particle size definition.
Around line 362 / Table 1: Please define the abbreviation tbuisurf before using it extensively.
Figure 10b: The figure appears to show two colored columns at one location in the vertical cross-section. If these both represent eastward-facing surfaces, please clarify how this can occur geometrically. At a given y-z location, one would normally expect only one east-facing surface for a single building boundary.
Conclusion: The statement on community standardization is important, but should be made more concrete. The authors could outline practical next steps, for example an ORM variable inventory, a candidate list of missing CF standard names, and a community discussion involving multiple model developer communities.
General: Please check the manuscript carefully for minor language issues, for example duplicated words such as “the the”.
Citation: https://doi.org/10.5194/egusphere-2025-5521-RC1
Data sets
High resolution obstacle-resolved model results for the city centre of Hamburg, Germany, from the microscale transport and stream model MITRAS on 21 June 2000 Vivien Voss https://doi.org/10.26050/WDCC/MITRAS_ATMODAT
Simulations for assessing model extensions using the obstacle-resolving model MITRAS Karolin S. Samsel et al. https://doi.org/10.26050/WDCC/WINTER_HAM_MitrasModEx
Interactive computing environment
nc2atmodat Vivien Voss https://doi.org/10.5281/zenodo.17035812
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 1,066 | 727 | 137 | 1,930 | 214 | 220 |
- HTML: 1,066
- PDF: 727
- XML: 137
- Total: 1,930
- BibTeX: 214
- EndNote: 220
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
Specifically, you do not provide the M2CDF code that you use in your work. You state that it can be obtained by contacting an email address, but we can not accept this unless you show that publishing it is out of your control or forbidden to you. For this, first, you need to clarify the terms of the license of M2CDF, and also, explain what prevents you of publishing it, or discussing with their authors that they publish it. This last point is specially relevant as the M2CDF software is developed in your institution, which makes harder to understand what prevents you from publishing it.
The GMD review process depends on reviewers and community commentators being able to access, during the discussion phase, the code and data on which a manuscript depends. Please, therefore, if possible publish the M2CDF code in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible. We cannot have manuscripts under discussion that do not comply with our policy. The 'Code and Data Availability’ section must also be modified to cite the new repository locations, and corresponding references added to the bibliography.
I must note that if you do not fix this problem, we cannot continue with the peer-review process or accept your manuscript for publication in GMD.
Juan A. Añel
Geosci. Model Dev. Executive Editor