OpenMindat v1.0.0 R package: A machine interface to Mindat open data to facilitate data-intensive geoscience discoveries
Abstract. Powered by data-driven knowledge discovery technologies such as machine learning and deep learning, increasingly exciting patterns are discovered in complex earth science big data. One of the world's most enormous treasure troves of mineral databases, Mindat ("mindat.org"), contains vast amounts of knowledge that are yet to be mined. Through a project called OpenMindat, an application programming interface (API) to enable open data query and access from Mindat had been set up. This paper presented an open-source R package (OpenMindat v1.0.0) to bridge the data highway, connecting users' overwhelming data needs, facilitating data-intensive query and access, unlocking novel insights, and enabling groundbreaking geoscience discoveries. The package was designed to be user-friendly and extensible. It exploits the capabilities of the Mindat API, including the data subjects of geomaterials (e.g., rocks, minerals, synonyms, variety, mixture, and commodity), localities, and the IMA (International Mineralogical Association)-approved mineral list. In addition to providing functions for querying those data subjects, the package supports exporting data to various formats such as CSV, JSON-LD, and TTL. In applications, these functions only require minor coding and provide invaluable convenience for users with limited R environment experience. The package is open on GitHub under the MIT license and with detailed tutorial documentation. The field of mineralogy and many other geoscience disciplines are facing the opportunities enabled by open data. Various research topics such as mineral network analysis, mineral association rule mining, mineral ecology, mineral evolution, and critical minerals have already benefited from Mindat's open data. We hope this R package will accelerate the process of those data-intensive studies and lead to more scientific discoveries.