Potential of natural language processing for metadata extraction from environmental scientific publications

Blanchy, Guillaume; Albrecht, Lukas; Koestel, John; Garré, Sarah

doi:10.5194/egusphere-2022-535

Preprints

https://doi.org/10.5194/egusphere-2022-535

Preprints

05 Jul 2022

| 05 Jul 2022

Potential of natural language processing for metadata extraction from environmental scientific publications

Guillaume Blanchy, Lukas Albrecht, John Koestel, and Sarah Garré

Abstract. Climate change will most likely lead to an increase of extreme weather events, including heavy rainfall with soil surface runoff and erosion. Adapting agricultural management practices that lead to increased infiltration capacities of soil has potential to mitigate these risks. However, effects of agricultural management practices (tillage, cover crops, amendment, …) on soil variables (hydraulic conductivity, aggregate stability, …) often depend on the pedo-climatic context. Hence, the only possibility to gather information needed to advise stakeholders on suitable management practices is to quantify such dependencies using meta-analyses of studies investigating this topic. As a first step, structured information from scientific publications needs to be extracted to build a meta-database, which then can be analyzed and recommendations can be given in dependence to the pedo-climatic context.

Manually building such a database by going through all publications is very time-consuming. Given the increasing amount of literature, this task is likely to require more and more effort in the future. Natural language processing (NLP) facilitates this task, but it is not clear yet to which extent the extraction process is reliable or complete. In this work, two corpora of documents were used, which we refer to as the OTIM and the Meta corpus in the following. The OTIM corpus contains the source publications of the entries of the OTIM database of near-saturated hydraulic conductivity from tension-disk infiltrometer measurements (https://github.com/climasoma/otim-db). The Meta corpus is constituted of all primary studies from 36 selected meta-analyses on the impact of agricultural practices on sustainable water management in Europe. We focused on three NLP techniques: topic modeling, tailored regular expressions and dictionaries and the shortest dependency path. We used topic modeling to sort the individual source-publications of the Meta corpus into 6 topics (e.g. related to cover crops, biochar, …) with a coherence metric Cv ranging from 0.7 to 0.9; Then, we used tailored regular expressions and dictionaries to extract coordinates, soil texture, soil type, rainfall, disk diameter and tensions on the OTIM corpus. We found that the respective information could be retrieved with 56 % up to 100 % of all relevant information (recall) and with a precision between 83 % and 100 %. Finally, we extracted relationships between a set of practices keywords (e.g. ‘biochar’, ‘zero tillage’, …) and soil variables (e.g. ‘soil aggregate’, ‘hydraulic conductivity’, ‘crop yield’,…) from the source-publications’ abstracts of the Meta corpus using the shortest dependency path between them. These relationships were further classified according to positive, negative or absent correlations between the driver and soil property. This quickly provided an overview of the different driver-variable relationships and their abundance for an entire body of literature. Overall, we found that all three tested NLP techniques were able to support evidence synthesis tasks such as selecting relevant publications on a topic, extracting specific information to build databases for meta-analysis and providing an overview of relationships found in the corpus. While human supervision remains essential, NLP methods have the potential to support fully automated evidence synthesis that can be continuously updated as new publications become available.

Received: 23 Jun 2022 – Discussion started: 05 Jul 2022

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 2066 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (2066 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

14 Mar 2023

Potential of natural language processing for metadata extraction from environmental scientific publications

Guillaume Blanchy, Lukas Albrecht, John Koestel, and Sarah Garré

SOIL, 9, 155–168, https://doi.org/10.5194/soil-9-155-2023,https://doi.org/10.5194/soil-9-155-2023, 2023

Short summary

Guillaume Blanchy, Lukas Albrecht, John Koestel, and Sarah Garré

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2022-535', Anonymous Referee #1, 10 Aug 2022

Interesting study regarding the use use natural language processing methods to extract information from the growing volume of scientific literature. The authors not only illustrate the use of different algorithms but also try to evaluate them numerically. In general, a well written manuscript. However, I think there is a lack of discussion and some of their objectives/aims are weakly met. The "relationship extraction" section is interesting and well written and the authors might want to put the same effort in the rest of the sections.

Comments

- Abstract: The beginning abstract seems a bit disconnected with the rest of the manuscript. Climate change is a hot topic but the paper itself is not related to that. I would suggest re-framing the abstract to match the content of the manuscript.

- Assessing the ability of an algorithm such as regex: I find this evaluation a bit estrange. The algorithms itself is infallible in the sense that it always finds what you tell it to find if it is present in the text. The algorithm is only restricted by the capacity of the user to generate valid regular expressions.

- Topic modelling: There is no discussion.

- How did you achieve your second aim (to illustrate the ability of topic classification to classify a new paper as relevant to a given topic)?

- You mention that topic modelling "can help identify knowledge gaps". How? Did you find any? If your aim is to present a practical workflow, perhaps you should guide the user to achieve that.

- Why did you select 6 topics instead of 9. You only mention that you are trying to maximise the coherence, which is higher for 9 topics.

- How does the number of topics might affect your workflow? Is selecting the highest coherence score infallible?

- Could you elaborate on how excluding monograms increased the coherence? From the term frequencies (Fig 7) I do not see many soil related terms, which seems strange. Perhaps they were ignored since their appeared as monograms? I do agree that bi and even trigrams are important but I have usually seen them added to a selection of monograms.

Citation: https://doi.org/10.5194/egusphere-2022-535-RC1
- AC1: 'Reply on RC1', Guillaume Blanchy, 12 Jan 2023
  
  General:
  Interesting study regarding the use of natural language processing methods to extract information from the growing volume of scientific literature. The authors not only illustrate the use of different algorithms but also try to evaluate them numerically. In general, a well written manuscript. However, I think there is a lack of discussion and some of their objectives/aims are weakly met. The "relationship extraction" section is interesting and well written and the authors might want to put the same effort in the rest of the sections.
  We appreciate that you find the study interesting and we thank you for your useful comments on the content that will help to improve the manuscript. We would like to state that the primary aim of the study was to demonstrate a practical workflow of several NLP techniques for summarising a large body of scientific literature. This was not properly reflected in the aims of our study. We will modify the aims accordingly in the revised version of the manuscript.
  We acknowledge that the “topic analysis” part is less developed and weakly matched the objective 2 of addressing if a paper was relevant or not to a topic. In this regard, we plan to restructure the content around topic classification in the manuscript. Instead of classifying “new papers” in different topics, we will now demonstrate how to identify groups of manuscripts (in our case, groups around different types of “agricultural practices”) and observe which groups are less represented (or absent). In this way, we can show practices less studied and identify possible knowledge gaps. This also serves as a first classification to identify on which topic would a meta-analysis be well suited for instance.
  
  Specific comments:
  - Abstract: The beginning abstract seems a bit disconnected with the rest of the manuscript. Climate change is a hot topic but the paper itself is not related to that. I would suggest re-framing the abstract to match the content of the manuscript.
  We will rephrase the abstract such that the main focus will be NLP techniques to summarise a large body of scientific environmental literature and then present the OTIM en Meta corpus as a case study on which we applied these techniques.
  - Assessing the ability of an algorithm such as regex: I find this evaluation a bit estrange. The algorithms itself is infallible in the sense that it always finds what you tell it to find if it is present in the text. The algorithm is only restricted by the capacity of the user to generate valid regular expressions.
  We agree that the regex algorithm is infallible but indeed, in this case, we want to estimate how well user -defined regexes are able to recover specific information. We will make clear in the manuscript that we do not assess the ability of the regex algorithm but rather the ability of the user generated regular expressions to match relevant content considering the trade-off between generality and their specificity.
  - Topic modelling: There is no discussion.
  Further discussion will be added, especially on how topic classification can be used as one of the first steps of the presented semi-automated NLP workflow for information summary and identifying groups of abundant literature where a meta-analysis can be useful.
  - How did you achieve your second aim (to illustrate the ability of topic classification to classify a new paper as relevant to a given topic)?
  (see general comment)
  - You mention that topic modelling "can help identify knowledge gaps". How? Did you find any? If your aim is to present a practical workflow, perhaps you should guide the user to achieve that.
  We agree that a practical interpretation will be a useful addition to the manuscript. We will give a few examples in the manuscript and develop how we identify them.
  - Why did you select 6 topics instead of 9. You only mention that you are trying to maximise the coherence, which is higher for 9 topics.
  That is a fair point and will be corrected in the next version of the manuscript.
  - How does the number of topics might affect your workflow? Is selecting the highest coherence score infallible?
  It is not infallible and we found that choosing a number of topics between 6 and 9 topics tends to lead to the same groups. The variability in coherence for each number of topics can be great, especially for a relatively small number of corpus as we have. This will be discussed in the revised version of the manuscript.
  - Could you elaborate on how excluding monograms increased the coherence? From the term frequencies (Fig 7) I do not see many soil related terms, which seems strange. Perhaps they were ignored since their appeared as monograms? I do agree that bi and even trigrams are important but I have usually seen them added to a selection of monograms.
  In our case, the inclusion of monograms led to words like ‘soil’, ‘treatment’, ‘water’, ‘crop’ or ‘tillage’ to appear prominently in the different topics. This did not allow us to differentiate the topic so well and the average topic coherence in this case was Cv = 0.4. With only bi-grams, some of these words carried more meaning: “conventional tillage”, “soil water”, “cover crop” and hence enabled better to see what the topic is about. This is the reason why, in this case, we preferred to only use bi-grams. This remark is a good point and we recognize that the addition of monograms as seen in other work can sometimes help. This will be discussed in the revised manuscript.
  
  Citation: https://doi.org/10.5194/egusphere-2022-535-AC1
RC2:
'Comment on egusphere-2022-535', Anonymous Referee #2, 28 Nov 2022

General:

Overall this manuscript fits well with SOIL, and the methodology as well as the results will be of interest to readers. The nature of the study, involving "natural language processing for metadata extraction from environmental {soil} scientific publications" is inherently multidisciplinary, and complex! The necessary methods are well discussed and well referenced, and the appendix of the NLP software will be a big help to researchers in this field. The results relating agricultural practices and soil and site properties are novel and important.

Specific:

Most SOIL readers are probably substantially unfamiliar with NLP and would benefit from more focused guidance by the authors, which can be accomplished perhaps mostly easily by a trimmed revision. For example the Abstract is overly complex; the Introduction states the objectives of the study on just four lines 96-100, and a trimmed Abstract could focus simply on the achieving of the objectives.

The Material and Methods section is appropriately long, given the emphasis on methods, but could be edited to be more uniformly coherent. Perhaps part of that could be fixed by reformatting the variety of figures, and relegating some of them to just the appendix.

Most of the figures in the Results section are important, but much of the other discussions in Results are really recommendations and can be eliminated or partly moved to Conclusions.

Technical:

I see Reviewer #1 listed some technical issues, most of which I believe can be handled by trimming as suggested.

Citation: https://doi.org/10.5194/egusphere-2022-535-RC2
- AC2: 'Reply on RC2', Guillaume Blanchy, 12 Jan 2023
  
  General:
  Overall this manuscript fits well with SOIL, and the methodology as well as the results will be of interest to readers. The nature of the study, involving "natural language processing for metadata extraction from environmental {soil} scientific publications" is inherently multidisciplinary, and complex! The necessary methods are well discussed and well referenced, and the appendix of the NLP software will be a big help to researchers in this field. The results relating agricultural practices and soil and site properties are novel and important.
  We appreciate that you find this manuscript well suited for the journal SOIL and more specifically to a multi-disciplinary topic related to agricultural practices. We are also glad to hear that our effort towards a reproducible workflow (by the means of notebooks, github repository) is acknowledged.
  
  Specific:
  Most SOIL readers are probably substantially unfamiliar with NLP and would benefit from more focused guidance by the authors, which can be accomplished perhaps mostly easily by a trimmed revision. For example the Abstract is overly complex; the Introduction states the objectives of the study on just four lines 96-100, and a trimmed Abstract could focus simply on the achieving of the objectives.
  Agree. As mentioned in reply to RC1, we will refocus the abstract around “NLP techniques” and the objectives we want to address in this work. Additionally, we will make sure that the NLP specific language is explained and simplified to make the abstract accessible to most.
  The Material and Methods section is appropriately long, given the emphasis on methods, but could be edited to be more uniformly coherent. Perhaps part of that could be fixed by reformatting the variety of figures, and relegating some of them to just the appendix.
  Figure 3 and Table 2 will be put in appendix to ease the flow through the Material and Methods section.
  Most of the figures in the Results section are important, but much of the other discussions in Results are really recommendations and can be eliminated or partly moved to Conclusions.
  Thank you for the feedback. We will edit the results in discussion this way and move recommendations to the conclusions section.
  Technical:
  I see Reviewer #1 listed some technical issues, most of which I believe can be handled by trimming as suggested.
  See reply to RC1.
  
  Citation: https://doi.org/10.5194/egusphere-2022-535-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2022-535', Anonymous Referee #1, 10 Aug 2022

Interesting study regarding the use use natural language processing methods to extract information from the growing volume of scientific literature. The authors not only illustrate the use of different algorithms but also try to evaluate them numerically. In general, a well written manuscript. However, I think there is a lack of discussion and some of their objectives/aims are weakly met. The "relationship extraction" section is interesting and well written and the authors might want to put the same effort in the rest of the sections.

Comments

- Abstract: The beginning abstract seems a bit disconnected with the rest of the manuscript. Climate change is a hot topic but the paper itself is not related to that. I would suggest re-framing the abstract to match the content of the manuscript.

- Assessing the ability of an algorithm such as regex: I find this evaluation a bit estrange. The algorithms itself is infallible in the sense that it always finds what you tell it to find if it is present in the text. The algorithm is only restricted by the capacity of the user to generate valid regular expressions.

- Topic modelling: There is no discussion.

- How did you achieve your second aim (to illustrate the ability of topic classification to classify a new paper as relevant to a given topic)?

- You mention that topic modelling "can help identify knowledge gaps". How? Did you find any? If your aim is to present a practical workflow, perhaps you should guide the user to achieve that.

- Why did you select 6 topics instead of 9. You only mention that you are trying to maximise the coherence, which is higher for 9 topics.

- How does the number of topics might affect your workflow? Is selecting the highest coherence score infallible?

- Could you elaborate on how excluding monograms increased the coherence? From the term frequencies (Fig 7) I do not see many soil related terms, which seems strange. Perhaps they were ignored since their appeared as monograms? I do agree that bi and even trigrams are important but I have usually seen them added to a selection of monograms.

Citation: https://doi.org/10.5194/egusphere-2022-535-RC1
- AC1: 'Reply on RC1', Guillaume Blanchy, 12 Jan 2023
  
  General:
  Interesting study regarding the use of natural language processing methods to extract information from the growing volume of scientific literature. The authors not only illustrate the use of different algorithms but also try to evaluate them numerically. In general, a well written manuscript. However, I think there is a lack of discussion and some of their objectives/aims are weakly met. The "relationship extraction" section is interesting and well written and the authors might want to put the same effort in the rest of the sections.
  We appreciate that you find the study interesting and we thank you for your useful comments on the content that will help to improve the manuscript. We would like to state that the primary aim of the study was to demonstrate a practical workflow of several NLP techniques for summarising a large body of scientific literature. This was not properly reflected in the aims of our study. We will modify the aims accordingly in the revised version of the manuscript.
  We acknowledge that the “topic analysis” part is less developed and weakly matched the objective 2 of addressing if a paper was relevant or not to a topic. In this regard, we plan to restructure the content around topic classification in the manuscript. Instead of classifying “new papers” in different topics, we will now demonstrate how to identify groups of manuscripts (in our case, groups around different types of “agricultural practices”) and observe which groups are less represented (or absent). In this way, we can show practices less studied and identify possible knowledge gaps. This also serves as a first classification to identify on which topic would a meta-analysis be well suited for instance.
  
  Specific comments:
  - Abstract: The beginning abstract seems a bit disconnected with the rest of the manuscript. Climate change is a hot topic but the paper itself is not related to that. I would suggest re-framing the abstract to match the content of the manuscript.
  We will rephrase the abstract such that the main focus will be NLP techniques to summarise a large body of scientific environmental literature and then present the OTIM en Meta corpus as a case study on which we applied these techniques.
  - Assessing the ability of an algorithm such as regex: I find this evaluation a bit estrange. The algorithms itself is infallible in the sense that it always finds what you tell it to find if it is present in the text. The algorithm is only restricted by the capacity of the user to generate valid regular expressions.
  We agree that the regex algorithm is infallible but indeed, in this case, we want to estimate how well user -defined regexes are able to recover specific information. We will make clear in the manuscript that we do not assess the ability of the regex algorithm but rather the ability of the user generated regular expressions to match relevant content considering the trade-off between generality and their specificity.
  - Topic modelling: There is no discussion.
  Further discussion will be added, especially on how topic classification can be used as one of the first steps of the presented semi-automated NLP workflow for information summary and identifying groups of abundant literature where a meta-analysis can be useful.
  - How did you achieve your second aim (to illustrate the ability of topic classification to classify a new paper as relevant to a given topic)?
  (see general comment)
  - You mention that topic modelling "can help identify knowledge gaps". How? Did you find any? If your aim is to present a practical workflow, perhaps you should guide the user to achieve that.
  We agree that a practical interpretation will be a useful addition to the manuscript. We will give a few examples in the manuscript and develop how we identify them.
  - Why did you select 6 topics instead of 9. You only mention that you are trying to maximise the coherence, which is higher for 9 topics.
  That is a fair point and will be corrected in the next version of the manuscript.
  - How does the number of topics might affect your workflow? Is selecting the highest coherence score infallible?
  It is not infallible and we found that choosing a number of topics between 6 and 9 topics tends to lead to the same groups. The variability in coherence for each number of topics can be great, especially for a relatively small number of corpus as we have. This will be discussed in the revised version of the manuscript.
  - Could you elaborate on how excluding monograms increased the coherence? From the term frequencies (Fig 7) I do not see many soil related terms, which seems strange. Perhaps they were ignored since their appeared as monograms? I do agree that bi and even trigrams are important but I have usually seen them added to a selection of monograms.
  In our case, the inclusion of monograms led to words like ‘soil’, ‘treatment’, ‘water’, ‘crop’ or ‘tillage’ to appear prominently in the different topics. This did not allow us to differentiate the topic so well and the average topic coherence in this case was Cv = 0.4. With only bi-grams, some of these words carried more meaning: “conventional tillage”, “soil water”, “cover crop” and hence enabled better to see what the topic is about. This is the reason why, in this case, we preferred to only use bi-grams. This remark is a good point and we recognize that the addition of monograms as seen in other work can sometimes help. This will be discussed in the revised manuscript.
  
  Citation: https://doi.org/10.5194/egusphere-2022-535-AC1
RC2:
'Comment on egusphere-2022-535', Anonymous Referee #2, 28 Nov 2022

General:

Overall this manuscript fits well with SOIL, and the methodology as well as the results will be of interest to readers. The nature of the study, involving "natural language processing for metadata extraction from environmental {soil} scientific publications" is inherently multidisciplinary, and complex! The necessary methods are well discussed and well referenced, and the appendix of the NLP software will be a big help to researchers in this field. The results relating agricultural practices and soil and site properties are novel and important.

Specific:

Most SOIL readers are probably substantially unfamiliar with NLP and would benefit from more focused guidance by the authors, which can be accomplished perhaps mostly easily by a trimmed revision. For example the Abstract is overly complex; the Introduction states the objectives of the study on just four lines 96-100, and a trimmed Abstract could focus simply on the achieving of the objectives.

The Material and Methods section is appropriately long, given the emphasis on methods, but could be edited to be more uniformly coherent. Perhaps part of that could be fixed by reformatting the variety of figures, and relegating some of them to just the appendix.

Most of the figures in the Results section are important, but much of the other discussions in Results are really recommendations and can be eliminated or partly moved to Conclusions.

Technical:

I see Reviewer #1 listed some technical issues, most of which I believe can be handled by trimming as suggested.

Citation: https://doi.org/10.5194/egusphere-2022-535-RC2
- AC2: 'Reply on RC2', Guillaume Blanchy, 12 Jan 2023
  
  General:
  Overall this manuscript fits well with SOIL, and the methodology as well as the results will be of interest to readers. The nature of the study, involving "natural language processing for metadata extraction from environmental {soil} scientific publications" is inherently multidisciplinary, and complex! The necessary methods are well discussed and well referenced, and the appendix of the NLP software will be a big help to researchers in this field. The results relating agricultural practices and soil and site properties are novel and important.
  We appreciate that you find this manuscript well suited for the journal SOIL and more specifically to a multi-disciplinary topic related to agricultural practices. We are also glad to hear that our effort towards a reproducible workflow (by the means of notebooks, github repository) is acknowledged.
  
  Specific:
  Most SOIL readers are probably substantially unfamiliar with NLP and would benefit from more focused guidance by the authors, which can be accomplished perhaps mostly easily by a trimmed revision. For example the Abstract is overly complex; the Introduction states the objectives of the study on just four lines 96-100, and a trimmed Abstract could focus simply on the achieving of the objectives.
  Agree. As mentioned in reply to RC1, we will refocus the abstract around “NLP techniques” and the objectives we want to address in this work. Additionally, we will make sure that the NLP specific language is explained and simplified to make the abstract accessible to most.
  The Material and Methods section is appropriately long, given the emphasis on methods, but could be edited to be more uniformly coherent. Perhaps part of that could be fixed by reformatting the variety of figures, and relegating some of them to just the appendix.
  Figure 3 and Table 2 will be put in appendix to ease the flow through the Material and Methods section.
  Most of the figures in the Results section are important, but much of the other discussions in Results are really recommendations and can be eliminated or partly moved to Conclusions.
  Thank you for the feedback. We will edit the results in discussion this way and move recommendations to the conclusions section.
  Technical:
  I see Reviewer #1 listed some technical issues, most of which I believe can be handled by trimming as suggested.
  See reply to RC1.
  
  Citation: https://doi.org/10.5194/egusphere-2022-535-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Publish subject to minor revisions (review by editor) (13 Jan 2023) by Olivier Evrard

AR by Guillaume Blanchy on behalf of the Authors (27 Jan 2023) Author's response Author's tracked changes Manuscript

ED: Publish as is (27 Jan 2023) by Olivier Evrard

ED: Publish as is (03 Feb 2023) by Kristof Van Oost (Executive editor)

AR by Guillaume Blanchy on behalf of the Authors (13 Feb 2023) Manuscript

Journal article(s) based on this preprint

14 Mar 2023

Potential of natural language processing for metadata extraction from environmental scientific publications

Guillaume Blanchy, Lukas Albrecht, John Koestel, and Sarah Garré

SOIL, 9, 155–168, https://doi.org/10.5194/soil-9-155-2023,https://doi.org/10.5194/soil-9-155-2023, 2023

Short summary

Guillaume Blanchy, Lukas Albrecht, John Koestel, and Sarah Garré

Model code and software

NLP jupyter notebooks Guillaume Blanchy https://github.com/climasoma/nlp

Guillaume Blanchy, Lukas Albrecht, John Koestel, and Sarah Garré

Viewed

Total article views: 1,754 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
965	724	65	1,754	49	102

HTML: 965
PDF: 724
XML: 65
Total: 1,754
BibTeX: 49
EndNote: 102

Views and downloads (calculated since 05 Jul 2022)

Month	HTML	PDF	XML	Total
Jul 2022	137	35	6	178
Aug 2022	54	25	2	81
Sep 2022	19	19	0	38
Oct 2022	36	12	1	49
Nov 2022	29	37	4	70
Dec 2022	19	22	1	42
Jan 2023	21	17	3	41
Feb 2023	23	17	0	40
Mar 2023	5	6	0	11
Apr 2023	0
May 2023	0
Jun 2023	0
Jul 2023	0
Aug 2023	0
Sep 2023	0
Oct 2023	0
Nov 2023	0
Dec 2023	0
Jan 2024	0
Feb 2024	0
Mar 2024	0
Apr 2024	0
May 2024	6	7	1	14
Jun 2024	11	15	0	26
Jul 2024	12	8	2	22
Aug 2024	16	12	4	32
Sep 2024	6	4	2	12
Oct 2024	10	8	0	18
Nov 2024	14	2	0	16
Dec 2024	12	4	6	22
Jan 2025	8	14	0	22
Feb 2025	18	10	0	28
Mar 2025	14	18	0	32
Apr 2025	6	26	0	32
May 2025	10	18	4	32
Jun 2025	20	26	0	46
Jul 2025	24	26	0	50
Aug 2025	24	40	6	70
Sep 2025	10	46	2	58
Oct 2025	12	22	0	34
Nov 2025	62	74	0	136
Dec 2025	64	32	4	100
Jan 2026	100	20	4	124
Feb 2026	64	28	4	96
Mar 2026	52	32	6	90
Apr 2026	34	18	1	53
May 2026	10	23	2	35
Jun 2026	3	1	0	4

Cumulative views and downloads (calculated since 05 Jul 2022)

Month	HTML	PDF	XML	Total
Jul 2022	137	35	6	178
Aug 2022	54	25	2	81
Sep 2022	19	19	0	38
Oct 2022	36	12	1	49
Nov 2022	29	37	4	70
Dec 2022	19	22	1	42
Jan 2023	21	17	3	41
Feb 2023	23	17	0	40
Mar 2023	5	6	0	11
Apr 2023	0
May 2023	0
Jun 2023	0
Jul 2023	0
Aug 2023	0
Sep 2023	0
Oct 2023	0
Nov 2023	0
Dec 2023	0
Jan 2024	0
Feb 2024	0
Mar 2024	0
Apr 2024	0
May 2024	6	7	1	14
Jun 2024	11	15	0	26
Jul 2024	12	8	2	22
Aug 2024	16	12	4	32
Sep 2024	6	4	2	12
Oct 2024	10	8	0	18
Nov 2024	14	2	0	16
Dec 2024	12	4	6	22
Jan 2025	8	14	0	22
Feb 2025	18	10	0	28
Mar 2025	14	18	0	32
Apr 2025	6	26	0	32
May 2025	10	18	4	32
Jun 2025	20	26	0	46
Jul 2025	24	26	0	50
Aug 2025	24	40	6	70
Sep 2025	10	46	2	58
Oct 2025	12	22	0	34
Nov 2025	62	74	0	136
Dec 2025	64	32	4	100
Jan 2026	100	20	4	124
Feb 2026	64	28	4	96
Mar 2026	52	32	6	90
Apr 2026	34	18	1	53
May 2026	10	23	2	35
Jun 2026	3	1	0	4

Viewed (geographical distribution)

Total article views: 1,694 (including HTML, PDF, and XML) Thereof 1,694 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 09 Jun 2026

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (2066 KB)
Metadata XML

Short summary

Adapting agricultural practices to future climatic conditions requires to synthesize the effects of management practices on soil properties with respect to local soil and climate. This study showcases different automated text processing methods to identify topics, extract metadata for building database and summarize findings from publication abstracts. While human intervention remains essential, these methods show great potential to support evidence synthesis from large number of publications.


Total:	0
HTML:	0
PDF:	0
XML:	0