Preprints
https://doi.org/10.5194/egusphere-2024-1141
https://doi.org/10.5194/egusphere-2024-1141
11 Jun 2024
 | 11 Jun 2024

OpenMindat v1.0.0 R package: A machine interface to Mindat open data to facilitate data-intensive geoscience discoveries

Xiang Que, Jiyin Zhang, Weilin Chen, Jolyon Ralph, and Xiaogang Ma

Abstract. Powered by data-driven knowledge discovery technologies such as machine learning and deep learning, increasingly exciting patterns are discovered in complex earth science big data. One of the world's most enormous treasure troves of mineral databases, Mindat ("mindat.org"), contains vast amounts of knowledge that are yet to be mined. Through a project called OpenMindat, an application programming interface (API) to enable open data query and access from Mindat had been set up. This paper presented an open-source R package (OpenMindat v1.0.0) to bridge the data highway, connecting users' overwhelming data needs, facilitating data-intensive query and access, unlocking novel insights, and enabling groundbreaking geoscience discoveries. The package was designed to be user-friendly and extensible. It exploits the capabilities of the Mindat API, including the data subjects of geomaterials (e.g., rocks, minerals, synonyms, variety, mixture, and commodity), localities, and the IMA (International Mineralogical Association)-approved mineral list. In addition to providing functions for querying those data subjects, the package supports exporting data to various formats such as CSV, JSON-LD, and TTL. In applications, these functions only require minor coding and provide invaluable convenience for users with limited R environment experience. The package is open on GitHub under the MIT license and with detailed tutorial documentation. The field of mineralogy and many other geoscience disciplines are facing the opportunities enabled by open data. Various research topics such as mineral network analysis, mineral association rule mining, mineral ecology, mineral evolution, and critical minerals have already benefited from Mindat's open data. We hope this R package will accelerate the process of those data-intensive studies and lead to more scientific discoveries.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.
Xiang Que, Jiyin Zhang, Weilin Chen, Jolyon Ralph, and Xiaogang Ma

Status: final response (author comments only)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • CC1: 'Comment on egusphere-2024-1141', Wenqiang Tang, 14 Nov 2024
  • CC2: 'Comment on egusphere-2024-1141', Sensen Wu, 14 Nov 2024
  • RC1: 'Comment on egusphere-2024-1141', Anonymous Referee #1, 25 Nov 2024
  • RC2: 'Comment on egusphere-2024-1141', Dominik Hezel, 21 Dec 2024
Xiang Que, Jiyin Zhang, Weilin Chen, Jolyon Ralph, and Xiaogang Ma
Xiang Que, Jiyin Zhang, Weilin Chen, Jolyon Ralph, and Xiaogang Ma

Viewed

Total article views: 602 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
460 115 27 602 20 19
  • HTML: 460
  • PDF: 115
  • XML: 27
  • Total: 602
  • BibTeX: 20
  • EndNote: 19
Views and downloads (calculated since 11 Jun 2024)
Cumulative views and downloads (calculated since 11 Jun 2024)

Viewed (geographical distribution)

Total article views: 577 (including HTML, PDF, and XML) Thereof 577 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 21 Jan 2025
Download
Short summary
This paper describes an R package as the machine interface to the open data of Mindat.org, one of the world's most widely used databases of mineral species and their distribution. In the past decades many geoscientists have been using the Mindat data, but an open data service has never been fully established. The machine interface described in this paper will be an efficient way to meet the overwhelming data needs.