the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Merged Observatory Data Files (MODFs): An Integrated Observational Data Product Supporting Process-Oriented Investigations and Diagnostics
Abstract. A large and ever-growing body of geophysical information is measured on campaigns and at specialized observatories as a part of scientific expeditions/experiments. These collections of observed data include many essential climate variables (as defined by the World Meteorological Organization), but are often distinguished by a wide range of additional non-routine measurements that are designed to not only document the state of the environment, but also the drivers that contribute to that state. These field data are not only used to further understand the environmental processes through observation-based studies, but also to provide baseline data to test model performance and to codify understanding to improve predictive capabilities. To address the considerable barriers and difficulty in utilizing these diverse and complex data for observation-model research, the Merged Observatory Data File (MODF) concept has been developed. The MODF combines measurements from multiple instruments into a single file that complies with well-established data format and metadata practices and has been designed to parallel development of corresponding Merged Model Data Files (MMDFs). Using MODF and MMDF protocols will facilitate the evolution of Model Intercomparison Projects into Model Intercomparison and Improvement Projects by putting observation and model data ‘on the same page’ in a timely manner. The MODF concept was developed especially for weather forecast model studies in the Arctic. The surprisingly complex process of implementing MODFs in that context refined the concept itself. Thus this article explains the concept of MODFs by providing details on the issues that were revealed and resolved during this first specific implementation. Detailed instructions are provided on how to make MODFs, and this article can thus be considered a MODF creation manual.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(999 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(999 KB) - Metadata XML
- BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
- RC1: 'Comment on egusphere-2023-2413', Anonymous Referee #1, 04 Jan 2024
-
RC2: 'Comment on egusphere-2023-2413', Anonymous Referee #2, 12 Jan 2024
The paper and contribution are excellent. I simply have suggestions for things to highlight explicitly a little more and to add in terms of stating the significance and use, and hopefully to encourage/guide usage and implementation in the future. Great work!
MODF review
60-64: It could be highlighted further in other parts of the paper that a huge strength of MODF is that the “same variables from observations and models” be created and provided “ in easy to use files of the same structure”. This is a huge hurdle and disconnect in most field experiments. If resolved in the MODF manner, it could engage many more modelers in using observatory data. Using the same variable or field name goes a long way toward connecting those dots and communities, which the authors note as MIPS and hopefully also more MIIPs in the future.
90: here and throughout, advocating for serial comma (a, b, and c): suggest inserting comma after rapid
97: The need for MODF might be even more important and necessary for coupled datasets, i.e. those dealing with more than 1 fluid and that attempt to also characterize fluxes and exchanges across the interface(s).
136-163: A challenge we face and have to address in our files is directionality, i.e. positive into ocean or atmosphere, what is negative or positive and why. This also requires standardization. It applies to both the radiative and turbulent fluxes.
164-174: A challenge I see with precipitation datasets is ambiguity in units as a function of or including/contextualized in a time scale. Example: units of mm (rain accumulatio) in an hourly dataset vs. a 1-min resolution mm/rh rain rate time series. These are different and units need to be specific.
Table 1 for global attributes, as well as section 5.3 and table 5 for variable attributes, are very useful. The instructions on how to complete attributes vary within a single agency and across agencies, so are confusing to create.
Section 6, Discussion, I wonder how or what the authors think about gradations of flexibility? i.e. what if someone wants to use Celsius instead of K… where do things break down and how much flexibility or personalization can a site do to the MODFs before they have gone off track, before it ceases to be worth the effort of creating it. Is the answer or guiding principle or hierarchy / prioritization of needs that the MODF should faithfully match the MMDF in all ways possible? The more the better? So as to say, if the model is in K, make the obs in K? In this way, can the authors comment on the hierarchy of needs or how best to prioritize effort or adherence to the principles or roadmap outlined here. What is the most generalizable golden rule or guiding principle? How will people know whether they are “doing it right” or “doing it wrong”, regardless of how much work they are putting in. One could put in a lot of work and might not make a very useful MODF. How can that be avoided or set back on track if its veering off course?
Citation: https://doi.org/10.5194/egusphere-2023-2413-RC2 -
AC1: 'Comment on egusphere-2023-2413', Taneil Uttal, 28 Feb 2024
RESPONSE TO REFEREEs
The main comments from both reviewers concerned (1) clarification of usage of the H-K schema to create MODFs (how to document versions, include additional metadata, in other words the article should be a better manual), (2) request for expanded discussion on the significance of MODFs supporting a 'vision' of facilitating future observation based model improvement and studies and (3) questions on future H-K Schema and MODF development.- Several clarifications and corrections were made in the usage sections.
- The discussion will have an expanded ‘vision’.
- It will be emphasized that the H-K Schema and MODF framework has been intentionally designed for format and platform growth for both general and specific applications with input from the community.REFEREE #1
Over Arching Comment: "The MODF concept is laudable and should be supported"
RESPONSE: Thank you!
Major Comment (1) summarized:
"While not the focus of the paper it feels very remiss not to have a brief 1-2 paragraph discussion around the broader surrounding exchange, archival and dissemination of these data."
RESPONSE:
The MODF concept is meant to address a specific challenge in data interoperability. It touches on some aspects of FAIR (e.g. attribution, PIDs, etc.) but the management of the data once it has been created is beyond the scope of what MODFs are meant to accomplish. We very much appreciate that Referee #1 recognizes the potential of a much wider scope of for usage of the MODF concept by data centers but these specifics of data management are out of the expertise (and influence) of the authors (field scientists and researchers using forecast center model data with field data to improve models). Some language will be added in the discussion suggesting that the data management community consider incorporating the MODF framework into data center operations.
Major Comment (2) summarized:
How do we address measurement collocation issues (interpreted here as the variation in measurement volumes for atmospheric profilers)? The referee suggested looking at "Immler et al 2005”.
RESPONSE:
We will investigate to see if https://cloud-net.org/ has addressed this issue and provides a standard measurement volume variable attribute that can be adopted by H-K Schema. However, it is likely that information on profiler measurement volumes will need to be documented for each individual measurement in variable attributes 'source', 'references', or 'comment' (See table 5). Also, a specific incarnation of a MODF with multiple profiler data could create a custom variable attribute if a standard is not available and request that their variable attribute become a standard. (Note: We were unable to find this specific Immler reference and would appreciate a citation from the referee if a profiler measurement volume standard is available there).
Major Comment (3) summarized:
Request for clarification of how MODF versioning will be addressed.
RESPONSE:
Versioning information is done primarily using the history attribute, it is recommended that each entry starts with a timestamp, as well as the version number of the variable. This is then followed with a record of the changes made to the data. To further aid reproducibility it is also recommended that the record contains relevant citations of code versions used as well as details on any processing the data has gone through between the versions. With the combination of this complete record of changes to the data through the different versions, as well as access to the raw data, a user wishing to reproduce results should be able to do so using the information provided.It was remiss of us not to include this information in the article. We will add to the version section (lines 277 - 286).
Major Comment (4) summarized:
Is the H-K schema a subset of GeoJSON?
RESPONSE:
The GeoJSON format differs from other GIS standards in that it was written and is maintained NOT by a formal standards organization, but by an Internet working group of developers. The H-K Schema uses ISO/TC 211 standards for digital geographic information.
Major Comment (5) Summarized:
Metadata is substantial but not sufficient.
RESPONSE:
We disagree. Infinite additional metadata can be added in the 'references' variable (and global) attributes and additional variable attributes can be added for specific applications. We will add some language emphasizing that the current H-K schema is a foundation that is meant to be augmented and developed with additional metadata variables as needed.
Minor Comments Referee #1 Summarized:
1) Line 28: ECVS are defined by GCOS not WMO. Comment accepted.
2) Line 326: Reference format : "Jones et al., 2020". Comment accepted.
3) We agree that Table 2 needs to be reformatted. It shall be done. Thank you.
4) Comment on lat_sonde and lon_sonde accepted. Good point on sonde descent data.
***********************************************************
REFEREE #2
Overarching Comment: "The Paper and Contribution are excellent"
RESPONSE:
Thank you!
Comment Line 60-64 summarized: Emphasize the strength of interoperable model-observation variables.
RESPONSE:
We will do this and will also note that it is important the models extract site data in real time during field experiments as it turns out to be very difficult to do after the fact. This is also important to preserve the model version output that was running at the time of the experiment.
Comment Line 90: Editorial comment accepted.Comment Line 97: Comment accepted. Will note the importance of the full system scope of MODFs to support energy and heat exchanges across interfaces.
Comment Line 136-163: Comment on directionality: That is an excellent comment. We are investigating if there is a cross system (atmosphere - earth-ocean) standard and trying to determine a strategy (besides documenting individual variable directionality with metadata) if there is not. Also investigating if models even have a consistent directionality standard (I suspect from some recent analysis I have seen that they do not). In any case, we will see what we can find out and present in the revised manuscript.
Comment Line 164-174 – Comment on precipitation units: Precipitation units are, as the referee noted highly ambiguous and need to be documented clearly in the variable metadata. This will be used as an example of the kind of information that needs to be in the variable metadata (Table 5).
Comment Table 1 Section 5.3 and Table 5 are useful.
Response: Thank you!
Comment Section 6 Discussion: What will be the graduations of flexibility, prioritization of needs, and guiding principles?
RESPONSE:
This is a good point and one we have argued philosophically among the 23 coauthors as we have developed the MF (MODF +MMDF) framework. The whole point of model-observation interoperability is based on having common standards, however common standards can be limiting for specific purposes. There are several issues such as (1) unit conversions can result actually changing the data (2) some data producers can only publish highly processed end data sets because of risks of misinterpretations based on raw data that has not been QC'ed (3) should data centers have MODF/MMDF checkers available as an on-line tool etc.
The discussion section will be expanded to address these question and as much as possible outline a strategy for future MODF development.Citation: https://doi.org/10.5194/egusphere-2023-2413-AC1
Interactive discussion
Status: closed
- RC1: 'Comment on egusphere-2023-2413', Anonymous Referee #1, 04 Jan 2024
-
RC2: 'Comment on egusphere-2023-2413', Anonymous Referee #2, 12 Jan 2024
The paper and contribution are excellent. I simply have suggestions for things to highlight explicitly a little more and to add in terms of stating the significance and use, and hopefully to encourage/guide usage and implementation in the future. Great work!
MODF review
60-64: It could be highlighted further in other parts of the paper that a huge strength of MODF is that the “same variables from observations and models” be created and provided “ in easy to use files of the same structure”. This is a huge hurdle and disconnect in most field experiments. If resolved in the MODF manner, it could engage many more modelers in using observatory data. Using the same variable or field name goes a long way toward connecting those dots and communities, which the authors note as MIPS and hopefully also more MIIPs in the future.
90: here and throughout, advocating for serial comma (a, b, and c): suggest inserting comma after rapid
97: The need for MODF might be even more important and necessary for coupled datasets, i.e. those dealing with more than 1 fluid and that attempt to also characterize fluxes and exchanges across the interface(s).
136-163: A challenge we face and have to address in our files is directionality, i.e. positive into ocean or atmosphere, what is negative or positive and why. This also requires standardization. It applies to both the radiative and turbulent fluxes.
164-174: A challenge I see with precipitation datasets is ambiguity in units as a function of or including/contextualized in a time scale. Example: units of mm (rain accumulatio) in an hourly dataset vs. a 1-min resolution mm/rh rain rate time series. These are different and units need to be specific.
Table 1 for global attributes, as well as section 5.3 and table 5 for variable attributes, are very useful. The instructions on how to complete attributes vary within a single agency and across agencies, so are confusing to create.
Section 6, Discussion, I wonder how or what the authors think about gradations of flexibility? i.e. what if someone wants to use Celsius instead of K… where do things break down and how much flexibility or personalization can a site do to the MODFs before they have gone off track, before it ceases to be worth the effort of creating it. Is the answer or guiding principle or hierarchy / prioritization of needs that the MODF should faithfully match the MMDF in all ways possible? The more the better? So as to say, if the model is in K, make the obs in K? In this way, can the authors comment on the hierarchy of needs or how best to prioritize effort or adherence to the principles or roadmap outlined here. What is the most generalizable golden rule or guiding principle? How will people know whether they are “doing it right” or “doing it wrong”, regardless of how much work they are putting in. One could put in a lot of work and might not make a very useful MODF. How can that be avoided or set back on track if its veering off course?
Citation: https://doi.org/10.5194/egusphere-2023-2413-RC2 -
AC1: 'Comment on egusphere-2023-2413', Taneil Uttal, 28 Feb 2024
RESPONSE TO REFEREEs
The main comments from both reviewers concerned (1) clarification of usage of the H-K schema to create MODFs (how to document versions, include additional metadata, in other words the article should be a better manual), (2) request for expanded discussion on the significance of MODFs supporting a 'vision' of facilitating future observation based model improvement and studies and (3) questions on future H-K Schema and MODF development.- Several clarifications and corrections were made in the usage sections.
- The discussion will have an expanded ‘vision’.
- It will be emphasized that the H-K Schema and MODF framework has been intentionally designed for format and platform growth for both general and specific applications with input from the community.REFEREE #1
Over Arching Comment: "The MODF concept is laudable and should be supported"
RESPONSE: Thank you!
Major Comment (1) summarized:
"While not the focus of the paper it feels very remiss not to have a brief 1-2 paragraph discussion around the broader surrounding exchange, archival and dissemination of these data."
RESPONSE:
The MODF concept is meant to address a specific challenge in data interoperability. It touches on some aspects of FAIR (e.g. attribution, PIDs, etc.) but the management of the data once it has been created is beyond the scope of what MODFs are meant to accomplish. We very much appreciate that Referee #1 recognizes the potential of a much wider scope of for usage of the MODF concept by data centers but these specifics of data management are out of the expertise (and influence) of the authors (field scientists and researchers using forecast center model data with field data to improve models). Some language will be added in the discussion suggesting that the data management community consider incorporating the MODF framework into data center operations.
Major Comment (2) summarized:
How do we address measurement collocation issues (interpreted here as the variation in measurement volumes for atmospheric profilers)? The referee suggested looking at "Immler et al 2005”.
RESPONSE:
We will investigate to see if https://cloud-net.org/ has addressed this issue and provides a standard measurement volume variable attribute that can be adopted by H-K Schema. However, it is likely that information on profiler measurement volumes will need to be documented for each individual measurement in variable attributes 'source', 'references', or 'comment' (See table 5). Also, a specific incarnation of a MODF with multiple profiler data could create a custom variable attribute if a standard is not available and request that their variable attribute become a standard. (Note: We were unable to find this specific Immler reference and would appreciate a citation from the referee if a profiler measurement volume standard is available there).
Major Comment (3) summarized:
Request for clarification of how MODF versioning will be addressed.
RESPONSE:
Versioning information is done primarily using the history attribute, it is recommended that each entry starts with a timestamp, as well as the version number of the variable. This is then followed with a record of the changes made to the data. To further aid reproducibility it is also recommended that the record contains relevant citations of code versions used as well as details on any processing the data has gone through between the versions. With the combination of this complete record of changes to the data through the different versions, as well as access to the raw data, a user wishing to reproduce results should be able to do so using the information provided.It was remiss of us not to include this information in the article. We will add to the version section (lines 277 - 286).
Major Comment (4) summarized:
Is the H-K schema a subset of GeoJSON?
RESPONSE:
The GeoJSON format differs from other GIS standards in that it was written and is maintained NOT by a formal standards organization, but by an Internet working group of developers. The H-K Schema uses ISO/TC 211 standards for digital geographic information.
Major Comment (5) Summarized:
Metadata is substantial but not sufficient.
RESPONSE:
We disagree. Infinite additional metadata can be added in the 'references' variable (and global) attributes and additional variable attributes can be added for specific applications. We will add some language emphasizing that the current H-K schema is a foundation that is meant to be augmented and developed with additional metadata variables as needed.
Minor Comments Referee #1 Summarized:
1) Line 28: ECVS are defined by GCOS not WMO. Comment accepted.
2) Line 326: Reference format : "Jones et al., 2020". Comment accepted.
3) We agree that Table 2 needs to be reformatted. It shall be done. Thank you.
4) Comment on lat_sonde and lon_sonde accepted. Good point on sonde descent data.
***********************************************************
REFEREE #2
Overarching Comment: "The Paper and Contribution are excellent"
RESPONSE:
Thank you!
Comment Line 60-64 summarized: Emphasize the strength of interoperable model-observation variables.
RESPONSE:
We will do this and will also note that it is important the models extract site data in real time during field experiments as it turns out to be very difficult to do after the fact. This is also important to preserve the model version output that was running at the time of the experiment.
Comment Line 90: Editorial comment accepted.Comment Line 97: Comment accepted. Will note the importance of the full system scope of MODFs to support energy and heat exchanges across interfaces.
Comment Line 136-163: Comment on directionality: That is an excellent comment. We are investigating if there is a cross system (atmosphere - earth-ocean) standard and trying to determine a strategy (besides documenting individual variable directionality with metadata) if there is not. Also investigating if models even have a consistent directionality standard (I suspect from some recent analysis I have seen that they do not). In any case, we will see what we can find out and present in the revised manuscript.
Comment Line 164-174 – Comment on precipitation units: Precipitation units are, as the referee noted highly ambiguous and need to be documented clearly in the variable metadata. This will be used as an example of the kind of information that needs to be in the variable metadata (Table 5).
Comment Table 1 Section 5.3 and Table 5 are useful.
Response: Thank you!
Comment Section 6 Discussion: What will be the graduations of flexibility, prioritization of needs, and guiding principles?
RESPONSE:
This is a good point and one we have argued philosophically among the 23 coauthors as we have developed the MF (MODF +MMDF) framework. The whole point of model-observation interoperability is based on having common standards, however common standards can be limiting for specific purposes. There are several issues such as (1) unit conversions can result actually changing the data (2) some data producers can only publish highly processed end data sets because of risks of misinterpretations based on raw data that has not been QC'ed (3) should data centers have MODF/MMDF checkers available as an on-line tool etc.
The discussion section will be expanded to address these question and as much as possible outline a strategy for future MODF development.Citation: https://doi.org/10.5194/egusphere-2023-2413-AC1
Peer review completion
Journal article(s) based on this preprint
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
373 | 130 | 37 | 540 | 25 | 23 |
- HTML: 373
- PDF: 130
- XML: 37
- Total: 540
- BibTeX: 25
- EndNote: 23
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Cited
Siri Jodha Khalsa
Barbara Casati
Gunilla Svensson
Jonathan Day
Jareth Holt
Elena Akish
Sara Morris
Ewan O'Connor
Roberta Pirazzini
Laura X. Huang
Robert Crawford
Zen Mariani
Øystein Godøy
Johanna A. K. Tjernström
Giri Prakash
Nicki Hickmon
Marion Maturilli
Christopher J. Cox
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(999 KB) - Metadata XML