The Path to FAIR Research Models: Lessons Learned
Abstract. Numerical modeling of Earth surface processes emerged as an important scientific tool in the late 1960s to mid-1970s, driven by the development of finite element methods in computer science. These advancements, initially applied in civil engineering, enabled scientists to simulate complex geological phenomena. At that time, models were often only described in publications, access was limited to researchers with direct connections to the developers, and the code was rarely documented for reuse, limiting their application beyond the original research context. The FAIR principles (Findability, Accessibility, Interoperability, and Reusability) as applied to data began to take shape in the 21st century with the rise of open science, digital repositories, and standardized data sharing frameworks. In the late 2010s, grassroots movements began to apply some of the FAIRness goals to numerical models. Subsequently, more formalized FAIR model principles were developed that addressed the specific needs of the scientific modeling community, resulting in the formulation of the FAIR principles for research software (FAIR4RS).
In this study, we examine the development and implementation of strategies by two geoscience research infrastructures – the CSDMS (Community Surface Dynamics Modeling System) Model Repository and the U.S. Geological Survey Model Catalog – to enhance the FAIRness of models guided by FAIR4RS. Some of the development and implementation efforts described predate the formalization of FAIR and FAIR4RS principles, making this an ongoing and adaptive process. We evaluate the temporal progression towards increased FAIR4RS alignment across three phases of research infrastructure development: prototype, refinement, and growth & iteration. Although certain principles were more straightforward to implement early in prototypes of the catalog infrastructures, others required broader community collaboration during refinement, and some continue to pose practical challenges in the growth and iteration phase. By tracing these dynamics, our aim is to provide insights that can guide other modeling initiatives in effectively adopting FAIR4RS principles within their communities.
Overall, I appreciate this content of this paper / study and the work that the authors did in it and in writing it up. I think this is a valuable resource for the software and information science community, and hope that it will be circulated more widely than in just in the Earth science community.
In particular, Section 3.1 is quite useful.
General comments:
The authors could mention CIG (https://community.geodynamics.org) somewhere as complementary to the work studied in this paper.
Adding blank lines to separate new paragraphs would be helpful. This is done in some parts of the paper, but not others. In particular, it would be helpful in the references section.
I find some of the terminology here a little confusing. When I heat models today, I think of machine learning or AI. The models here are more modeling/simulation programs or functions. Note that the FAIR4RS principles are about Research Software. There is also a group working on FAIR4ML, where ML is machine learning models. If the authors want to keep using "models", it should clearly be defined at the start.
Similarly, when looking at Figure 2, it's unclear to me if "publish data model and records" is discussing the software or the data that it produces. And in the F row, what metadata is being discussed? Metadata about data or metadata about the software? This is made more clear in the paper text, but the figure/caption could also be clarified.
Specific comments (with line numbers):
46 - perhaps mention https://www.researchsoft.org/tf-actionable-fair4rs/
75 - Please add the names of the two model catalogs to the caption
332 - I strongly disagree with idea here that using a CC0 dedication/license is appropriate. While Creative Commons says this can be done (https://wiki.creativecommons.org/wiki/CC0_FAQ#May_I_apply_CC0_to_computer_software.3F_If_so.2C_is_there_a_recommended_implementation.3F), it mentions that OSI does not approve this, and given that many projects consider the use of an OSI-approved license the definition of open source, software that has a CC0 dedication may not be considered open source. Also, even CC says that CC0 is a dedication, not a license. See https://opensource.org/blog/public-domain-is-not-open-source for OSI's view.
455 - it might be worth mentioning LLMs here, as this is the technology that is being most tested for this purpose.
551 - JOSS could be cited - One recent paper is https://doi.org/10.31274/jlsc.18285 (Note that I am an author of this.)
577 - It would be useful to mention SciCodes.org here, and for the authors to participate in it if they don't already, or at least to make sure that their lessons get back to that community.