GC Insights: Breaking the silos &ndash; leveraging NLP to encourage interdisciplinary interaction at the EGU

Sodoge, Jan; Nunes Carvalho, Taís Maria; de Brito, Mariana Madruga

doi:https://doi.org/10.5194/egusphere-2024-3430

Preprints

Abstract
Discussion
Metrics

Preprints

https://doi.org/10.5194/egusphere-2024-3430

Preprints

Abstract
Discussion
Metrics

18 Dec 2024

| 18 Dec 2024

GC Insights: Breaking the silos – leveraging NLP to encourage interdisciplinary interaction at the EGU

Jan Sodoge, Taís Maria Nunes Carvalho, and Mariana Madruga de Brito

Abstract. Thousands of abstracts from various geoscience sub-fields are presented annually at the EGU General Assembly (GA), offering a rich resource for tracking scientific progress. However, rigid session groupings can limit cross-disciplinary exploration. Here, we show that participants focusing only on their broad disciplinary session miss an average of 44 % of the 10 most relevant contributions. To break this compartmentalization, we propose using natural language processing (NLP), enabling the geoscience community to explore the full breadth of knowledge beyond traditional disciplinary boundaries.

How to cite. Sodoge, J., Nunes Carvalho, T. M., and de Brito, M. M.: GC Insights: Breaking the silos – leveraging NLP to encourage interdisciplinary interaction at the EGU, EGUsphere [preprint], https://doi.org/10.5194/egusphere-2024-3430, 2024.

Received: 04 Nov 2024 – Discussion started: 18 Dec 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 1286 KB)

Download & links

Preprint (1286 KB)
Metadata XML
BibTeX
EndNote

Jan Sodoge, Taís Maria Nunes Carvalho, and Mariana Madruga de Brito

Status: final response (author comments only)

RC1:
'Comment on egusphere-2024-3430', Lina Stein, 17 Jan 2025

The authors present an analysis of all abstracts submitted to the EGU General Assembly in the last five years. The abstracts are clustered according to semantic similarity of the abstract text. For each scientific division, the distribution of abstracts across the cluster space is evaluated to demonstrate that similar research is submitted across different scientific divisions within the EGU GA. The authors claim, that since attendees primarily attend sessions from their own scientific division, this means attendees might miss relevant research (bubble effect). I found the overview of abstract similarity across scientific divisions very informative. It demonstrates that the scientific divisions have more overlap than the strict separation necessary for conference organization might suggest.
Additionally, the authors propose a webtool, which can recommend EGU GA submissions based on similarity to a search string. I have actually used the prototype during the 2024 EGU and found it immensely helpful.
Overall, I find this article timely, relevant, and well-written. I only have a few minor suggestions for improvement.
Specific comments:
Your main assumption is that people will not attend presentations of research similar to their own submission due to it being in another scientific division. E.g. Line 10 “We hypothesize that this compartmentalization may inadvertently create knowledge silos, as EGU GA attendants tend to focus on their own scientific divisions, potentially missing relevant developments from other disciplines.” From my own (probably biased) experience I would say that cross-division attendance does take place a lot during EGU. And the EGU website does already offer a search tool that allows people to find abstracts across scientific divisions according to keyword searches. However, text similarity offers a much better tool to identify presentations of interest. Simple keyword searches often still produce a large number of results. While it is, of course, impossible to quantify which scientific division sessions are attended by whom during the GA, it would help your claim to offer some more numbers to demonstrate the infeasibility of manually checking all sections for relevant contributions. How many abstracts are submitted per section? How many sessions are registered per section?
On thing I found irritating is that the 22 scientific divisions or scientific sections are referred to as sessions. There are multiple sessions organised within each scientific division. But the authors split their data into scientific divisions and called them session. That the selection on the EGU GA website offers “Disciplinary sessions” is not a name, but means that they are the session related to one specific division.
Please clarify in the abstract and text what you mean by participant exploration or participant focus. It should be stated early on and more clearly, that you refer to choice of session/division attendance during the GA. (And not, for example, choice of session during the abstract submission process).
Lastly, it would be nice to add the number of people who have used the webtool to find relevant sessions during EGU24.

Citation: https://doi.org/10.5194/egusphere-2024-3430-RC1
- AC1: 'Reply on RC1', Jan Sodoge, 27 Jan 2025
  
  Thanks for your positive feedback. Below we provide answers to your comments and how we will address these in the updated manuscript.
  
  Comment: Your main assumption is that people will not attend presentations of research similar to their own submission due to it being in another scientific division. E.g. Line 10 “We hypothesize that this compartmentalization may inadvertently create knowledge silos, as EGU GA attendants tend to focus on their own scientific divisions, potentially missing relevant developments from other disciplines.” From my own (probably biased) experience I would say that cross-division attendance does take place a lot during EGU. And the EGU website does already offer a search tool that allows people to find abstracts across scientific divisions according to keyword searches. However, text similarity offers a much better tool to identify presentations of interest. Simple keyword searches often still produce a large number of results. While it is, of course, impossible to quantify which scientific division sessions are attended by whom during the GA, it would help your claim to offer some more numbers to demonstrate the infeasibility of manually checking all sections for relevant contributions. How many abstracts are submitted per section? How many sessions are registered per section?
  Answer: This is a good suggestion. Indeed, as you point out our initial assumption is very stylized- which might not reflect reality at EGU GA. In the updated manuscript, we added to this hypothesis that it is likely not reflecting reality yet that such simplification is necessary for our assessment. This is a limitation of our study that requires clearer acknowledgment. In fact, for the next EGU, we plan to survey individual users' attributes (e.g., which division(s) does one belong to) to assess the quality of recommendations and select the most appropriate model. Regarding the current manuscript, we agree and will implement your suggestion of adding some numbers on the volume of abstracts to emphasize the 'infeasibility of manually checking'.
  
  Comment: On thing I found irritating is that the 22 scientific divisions or scientific sections are referred to as sessions. There are multiple sessions organised within each scientific division. But the authors split their data into scientific divisions and called them session. That the selection on the EGU GA website offers “Disciplinary sessions” is not a name, but means that they are the session related to one specific division.
  
  Please clarify in the abstract and text what you mean by participant exploration or participant focus. It should be stated early on and more clearly, that you refer to choice of session/division attendance during the GA. (And not, for example, choice of session during the abstract submission process).
  Answer: We agree with your criticism of the unclear terminology. We used the disciplinary sessions throughout the manuscript as it is the name on the GA website but understand that "scientific division" is a much clearer terminology. As such, we revised the entire manuscript and clarified in the text that we use the term "scientific divisions" to refer to what sometimes is called disciplinary sessions.
  
  Comment: Lastly, it would be nice to add the number of people who have used the webtool to find relevant sessions during EGU24.
  Answer: Sadly, we have no clear evidence of the number of users based on the platform and subscription we used for hosting this app (shinyapps.io). The only metric we could quantify is 45 hours of usage time. However, this does not correspond to a particular number of users. For the upcoming EGU25 though, we plan to quantify such metrics more in-depth by hosting the application via our computational resources (see also the first comment on our ambitions to quantify more aspects to improve the evaluation).
  On behalf of the authors,
  
  Jan Sodoge
  
  Citation: https://doi.org/10.5194/egusphere-2024-3430-AC1
CC1:
'Comment on egusphere-2024-3430, by Maria-Helena Ramos', Maria-Helena Ramos, 14 Mar 2025
Dear authors,
I enjoyed reading the paper and thank you for the initiative and study, which is very interesting. As current EGU Programme Committee Co-chair for the preparation of the General Assembly, I would like to list some suggestions below, which I hope may clarify some points and improve your manuscript.
General comments:
Scientific sessions in the EGU GA are structured around disciplinary Programme Groups (PG), complemented by the EOS (Education and Outreach Sessions) programme group and the ITS (Inter- and Transdisciplinary Sessions) programme group, as they are called. Below I indicated where I think this organization should be clarified to avoid misunderstanding with the terminology. In several instances, I believe that “session” is misused in the place of “programme group”. They are indicated in details below.

While I understand the choice of not considering sessions in the ITS programme group (lines 33-36), I think some clarification may be given also in relation to the fact that some “disciplinary” programme groups are, in fact, highly “inter- or transdisciplinary” too. This is, to a high degree, notably the case of NH (Natural Hazards) and Earth and Space Science Informatics (ESSI), for instance, and, maybe to a lower degree, to some other disciplinary programme groups. How could this issue affect your analysis and results?

Also, I was wondering how co-organization of sessions within two or more programme groups were considered in the analysis, and how this might affect the results. Have you selected abstracts only from sessions that are not co-organized by other PGs? (session co-organizations are marked as such in the EGU GA programme: “CR3.3 - Advances in sea-ice modelling: developments and new techniques. Co-organized by NP1/OS1”. In this example, the session is led by the CR programme group, but since it is co-organized it is also displayed in the NP and the OS programme groups). Programme groups have different percentage of co-organized sessions (over the total number of led sessions), which can vary significantly.

When looking at Fig. 1, I was also wondering how much the very different sizes of the PGs may affect the results. The EGU programme has PGs with 2,000+ abstracts in 200+ sessions, as well as PGs with 200+ abstracts in 10+ sessions. Would this affect the analysis and results? How/why not?

If I understand well, the paper suggests that using an automatic k-clustering approach could lead to more topically-similar programme groups at the EGU GA. While this might be a good idea, I think one has to consider also the “errors” of the text similarity approach. Maybe adding some comments on that in the Discussion section would be important here. Also, one has to consider that, currently, mostly it is up to the authors to make the decision of where they want to submit their abstract to. From my experience, I believe this choice depends on the disciplines, but also on the community around a given session (conveners team, authors of the same session in the past year(s), etc.). Under a clustering approach as proposed, this might be lost, if I understand well, and the system will make this choice for the author. In your opinion, what would be the obstacles/limitations here?

In the Discussion section, it could be interesting to add that the authors are currently working with the EGU Programme Committee and Copernicus Conference Manager to experiment with additional tools that could be integrated to the conference system and improve the experience attendees and authors may have in future EGU General Assemblies. These experiments can be useful to validate the NLP algorithms in real case applications.

Specific comments:
General suggestion: replace “attendants” to “attendees”
Line 7: EGU General Assembly gathers over 20,000 participants worldwide. I suggest to update the numbers here.
Lines 8-9: change to: “…the EGU GA is currently structured into 22 disciplinary programme groups (PGs) and further sessions included in the ITS (Inter- and Transdisciplinary Sessions) programme group and the EOS (Education and Outreach Sessions) programme group, among other Union-wide sessions. We hypothesize…”
Line 10: change to: “…as EGU GA attendees might tend to focus on the disciplinary PG related to their own scientific discipline, potentially…” => I believe that “might” should be used here as, to my knowledge, so far there is no study that has put this into evidence.
Line 22: change to: “… the compartmentalization at the EGU GA programme presentation may…” => since there are instruments for co-organizations as well as other Union-wide sessions in the EGU GA programme, I believe that a precision must be added here to say that it refers to the EGU GA programme presentation (not to the EGU GA as a whole).
Line 30: do you mean “corresponding session identification (session ID)”? I think “session data” is a bit unclear.
Line 31: change “presented in 22 disciplinary sessions” to “presented in the 22 disciplinary programme groups”, and Line 31-32: change to “disciplinary programme groups” (or “disciplinary PGs”) => I think you are referring to PGs, rather than sessions. “Sessions” are the individual sessions inside the PGs, and there are more than 22 of those at EGU GA (usually, there are 1,000+ sessions in an EGU GA).
Line 32: change to: “…may vary significantly…”
Line 33: suggestion to change to: “… identify their research as “interdisciplinary”,
Line 34: I do not fully understand this sentence: could you clarify? I am not sure that “rarely align with ITS sessions” may be fully correct. The ITS sessions are very popular among authors, and the number of abstract submissions to the ITS PG has well increased over the years. Also, it is not clear to me what you mean by “tend to be… narrowly focused”, since the nature itself of the sessions in the ITS PG is to be broad, involving one or more disciplines.
Line 35: change to: “… concentrating only on abstracts submitted to sessions within disciplinary programme groups provides…”
Lines 50-53: I got a bit confused here: do you mean “sessions” or “programme groups”? I have the impression that you mean “programme groups” (at list on lines 52 and 53); please check (see also my general comment above)
Lines 54-55: change to: “…to evaluate how well the grouping of the EGU GA abstracts in the 22 PGs compares with the clustering using the k-means clustering algorithm, considering the same number of clusters as the number of disciplinary PGs (n=22).” Please, pay attention to the fact that there may be a terminology-based confusion here also between "sessions" and “programme groups”.
Line 60: suggestion to change to “…landscape of the EGU GA abstract sample analysed here, …”
Line 61: change “disciplinary session” to “disciplinary PG”
Line 62: change “… and climate (CL) session…” to “… and climate (CL) programme groups...”
Line 64: change “disciplinary sessions” to “disciplinary PGs”
Lines 64-65: suggestion to change to “… to a research abstract on modelling… include an abstract on evaluating…”
Lines 67: I think here also you mean “presented in different programme groups”, not different “sessions”; is that so? Or maybe “presented in different sessions, belonging to different programme groups” (?)
Line 67: Maybe complete with the following: “… would potentially be missed if an attendee only consults one programme group and the sessions are not co-organized”. => I think this is important to mention since consulting other PGs when preparing a “personal programme” and displaying co-organized sessions in all the programmes of the PGs involved in the co-organization (as done in the EGU GA programme) would minimize this effect.
Line 68: change to “disciplinary programme groups”
Line 69: also here, do you mean “within the programme group where their abstract was submitted to potentially…”? Instead of “session”?
Line 70: Suggestion: I would rather say “…. of the 10 contributions most similar to their abstract”.
Line 70 and Line 72: change “session” to “programme group” in the three occurrences in these lines. AS and ST, as well as GI and NP, are not sessions, but “programme groups”.
Lines 73-74: I think the sentence is a bit confusing (I could not grasp the message/meaning). Maybe rephrase it? Sentence: “…reduce the share of relevant contributions covered by the own “session=> programme group”
Line 75: change to “using disciplinary programme groups”
Line 77-78: I am bit confused here: do you mean to group “similar abstracts”? I do not understand the term “group sessions”. Sessions are grouped in programme groups. Maybe you mean: “to build programme groups that instead of focusing on a discipline would bring together statistically similar abstracts”.
Line 81: the sentence may be missing something: to keep track of what exactly? Have you consulted participants on that? And organizers (what type of organizers? Conveners? Programme Group chairs?). I think it might be useful to provide more details here.
Line 82: replace by “last five General Assemblies”
Line 84: suggestion: “the participant identifies most with”. Having said that, I believe that the issue is not only “which discipline a participant identifies with”, but how participants search the programme of the GA to prepare their own personal programme: do they stay in the same programme group or do they navigate through the other programme groups as well? As the number of abstracts increase, I agree that searching through the programme, inside the programme groups and their respective sessions, might become a laborious task tough!
Line 85: suggestion: “the possible presence of a bubble effect” => to me, a bubble effect in itself needs more than a disciplinary-based programme display organization; it requires also that participants do not look outside their disciplinary programme group, for instance.
Line 86: replace “sessions” to “programme groups” or “disciplines”
Line 87: could it be related to the fact that in EGU GA NP and GI are also smaller programme groups, comparatively to PGs such as AS and HS, for instance?
Line 88: replace “sessions” to “programme groups”
Line 94: replace “session” to “programme group”
Line 95: change to :.. interdisciplinary connections, which in turn may lead to innovative…”. Also, I don’t understand the “higher productivity” here? Do you mean (even) more research papers being published? Projects funded? Please, explain.
Lines 96-97: change “EGU GA 2024” to “EGU24 General Assembly”.
Line 98: replace “sessions” to “programme groups”
Line 100: replace “sessions” to “programme groups”
Figure 1 caption: change “session” to “programme group” in the four occurrences.
Figure 1A: I think it could be slightly re-worded to avoid potential misunderstandings. Suggestion: “…
Thank you again for the work done and the contribution to making EGU General Assemblies an even better and exciting event!
Citation: https://doi.org/10.5194/egusphere-2024-3430-CC1
- AC3:
  'Reply on CC1', Jan Sodoge, 01 Apr 2025
  Dear Helena,
  
  thank you for the constructive feedback on our manuscript. Your perspective as an organizer of these EGU sessions, I believe, is really helpful for improving upon our manusscript. Below find answers to your comments.
  General comments:
  Comment: Scientific sessions in the EGU GA are structured around disciplinary Programme Groups (PG), complemented by the EOS (Education and Outreach Sessions) programme group and the ITS (Inter- and Transdisciplinary Sessions) programme group, as they are called. Below I indicated where I think this organization should be clarified to avoid misunderstanding with the terminology. In several instances, I believe that “session” is misused in the place of “programme group”. They are indicated in details below.
  
  Answer: This is correct. We have updated the terminology accordingly in the revised manuscript. In line with the comments from Reviewer 1 (Lina Stein), we now differentiate between scientific divisions and sessions. To enhance clarity for readers, we use the term scientific division instead of program groups. In the introduction, we define scientific divisions and program groups as overlapping concepts or synonymous, at least for the scope of this analysis. From the perspective of an EGU GA participant, we believe that scientific division is a more intuitive term.
  
  Comment: While I understand the choice of not considering sessions in the ITS programme group (lines 33-36), I think some clarification may be given also in relation to the fact that some “disciplinary” programme groups are, in fact, highly “inter- or transdisciplinary” too. This is, to a high degree, notably the case of NH (Natural Hazards) and Earth and Space Science Informatics (ESSI), for instance, and, maybe to a lower degree, to some other disciplinary programme groups. How could this issue affect your analysis and results?
  
  Answer: We understand that the decision not to consider the ITS program group is debatable—we debated this as well. In the submitted version of the manuscript, we initially decided not to integrate these sessions. However, we will include them in the updated manuscript.
  
  Our dataset contains 3,663 presentations associated with the ITS group. For the updated results, we added ITS sessions to their associated program groups (e.g., ITS and NH). We then repeated the experiment to evaluate whether the share of missed talks had changed. Our results revealed that adding ITS sessions actually increased the number of missed relevant presentations in some divisions (AS, CR, BG, HS, SSS), but it did not reduce the number of missed presentations in any division.
  
  We attribute this to a potential mechanism, which we will include alongside the results in the manuscript. Because ITS sessions are interdisciplinary, their thematic coherence is often lower than that of specific divisions. As a result, selected presentations from ITS sessions have a lower chance of matching relevant talks within their associated division. Despite this potential explanation, ITS sessions still offer valuable additional interdisciplinary perspectives which we will emphasize next to those results.
  
  Yet, these additional findings should still be interpreted with caution, taking into account your comment 3 (see below) and our response, which will also be incorporated into the updated manuscript.
  
  Comment: Also, I was wondering how co-organization of sessions within two or more programme groups were considered in the analysis, and how this might affect the results. Have you selected abstracts only from sessions that are not co-organized by other PGs? (session co-organizations are marked as such in the EGU GA programme: “CR3.3 - Advances in sea-ice modelling: developments and new techniques. Co-organized by NP1/OS1”. In this example, the session is led by the CR programme group, but since it is co-organized it is also displayed in the NP and the OS programme groups). Programme groups have different percentage of co-organized sessions (over the total number of led sessions), which can vary significantly.
  
  Answer: This is important because we did not clarify this enough in the manuscript, as we realized some shortcomings in our data collection when you pointed them out. When collecting data for the analysis, we did not store information on the different program groups/divisions leading the session versus those organizing it. Instead, we only have information about the group leading the session (e.g., in your example, CR). Hence, in our analysis, we can only consider which program group/division leads the sessions, which is a clear limitation.
  
  We will add this limitation to the manuscript, highlighting that, as you mentioned, the share of such co-organized sessions varies by session and might influence the results. An important note is that we still consider the sessions with ITS (see previous comment).
  
  Comment: When looking at Fig. 1, I was also wondering how much the very different sizes of the PGs may affect the results. The EGU programme has PGs with 2,000+ abstracts in 200+ sessions, as well as PGs with 200+ abstracts in 10+ sessions. Would this affect the analysis and results? How/why not?
  
  Answer: This is an important consideration, and we looked into it. We found a correlation of 0.38 between the number of relevant talks in the same division and the number of presentations per division. This result indicates a low correlation, where divisions with more presentations in total also have a higher share of relevant presentations among the most similar ones for each abstract.
  
  A possible explanation is that with more presentations, there is a higher chance of finding a similar presentation within the larger "cloud" of presentations—using the language of the cartography of presentations we created in the main figure. However, since the correlation is <0.4, this mechanism does not seem particularly strong, meaning our results are unlikely to be heavily influenced by it.
  
  Nonetheless, we will add this to the results section as an important consideration regarding how division size might affect the metric of missed presentations.
  
  Comment: If I understand well, the paper suggests that using an automatic k-clustering approach could lead to more topically-similar programme groups at the EGU GA. While this might be a good idea, I think one has to consider also the “errors” of the text similarity approach. Maybe adding some comments on that in the Discussion section would be important here. Also, one has to consider that, currently, mostly it is up to the authors to make the decision of where they want to submit their abstract to. From my experience, I believe this choice depends on the disciplines, but also on the community around a given session (conveners team, authors of the same session in the past year(s), etc.). Under a clustering approach as proposed, this might be lost, if I understand well, and the system will make this choice for the author. In your opinion, what would be the obstacles/limitations here?
  
  Answer: When discussing the use of k-means for organizing sessions at EGU, we were careful not to propose a purely text-based approach. We fully acknowledge the limitations of text similarity methods, including potential classification errors and the challenge of capturing disciplinary nuances. Your point about the role of authors in selecting the most appropriate session—often influenced by community ties, and past participation—is particularly important.
  
  We do not yet have a clear roadmap for how k-means clustering could be implemented. Most likely, it would need to be integrated through a co-design process to ensure it complements, rather than replaces, the existing system. Our initial framing was intentionally somewhat provocative to stimulate discussion on these complexities. In the revised manuscript, we will clarify this perspective and explicitly address its limitations, as well as the need to preserve author agency in session selection.
  
  Comment: In the Discussion section, it could be interesting to add that the authors are currently working with the EGU Programme Committee and Copernicus Conference Manager to experiment with additional tools that could be integrated to the conference system and improve the experience attendees and authors may have in future EGU General Assemblies. These experiments can be useful to validate the NLP algorithms in real case applications.
  
  Answer: Yes, this is an important aspect to highlight concerning the actual implementation of such more vague and theoretical thought experiments. We will highlight both the example of the tool used for recommending participants sessions to submit their abstracts to and the tool for recommending relevant presentations for participants during the conference. For the latter, as mentioned in the previous peer review comments, we will highlight how this can help for systematically incorporating user feedback.
  
  Specific comments:
  Answer: Below we respond only to those specific comments which we did not adapt as suggested or that are resolved considering the new phrasing on program groups, divisions and sessions which confused and we improved as suggested in the comments above.
  Line 34: I do not fully understand this sentence: could you clarify? I am not sure that “rarely align with ITS sessions” may be fully correct. The ITS sessions are very popular among authors, and the number of abstract submissions to the ITS PG has well increased over the years. Also, it is not clear to me what you mean by “tend to be… narrowly focused”, since the nature of the ITS PG sessions is broad, involving one or more disciplines.
  
  Answer: This was indeed phrased as not ideal by us here as it causes some misunderstandings. The ‘narrowly focussed’ we intended to say that the contributions in these sessions often do not overlap with other divisions also considering how they change between the years. We acknowledge that ‘narrowly focussed’ is the wrong description here. We will adjust this sentence where we discuss the ITS based on your previous comment above.
  Line 84: suggestion: “the participant identifies most with”. Having said that, I believe that the issue is not only “which discipline a participant identifies with”, but how participants search the programme of the GA to prepare their own personal programme: do they stay in the same programme group or do they navigate through the other programme groups as well? As the number of abstracts increase, I agree that searching through the programme, inside the programme groups and their respective sessions, might become a laborious task tough!
  Answer: The laborious task of searching for presentations is indeed a competing or complementary explanation. We add that both are valid reasons in the updated manuscript. As we do not have empirical evidence on the strength of each effect, it is important to highlight all potential mechanisms.
  Line 85: suggestion: “the possible presence of a bubble effect” => to me, a bubble effect in itself needs more than a disciplinary-based programme display organization; it requires also that participants do not look outside their disciplinary programme group, for instance.
  
  Answer: This is a comment we wanted to pick up on in some more detail because it makes a valid point that we will underline in the updated manuscript. Concerning the bubble effect, we do not posit that such an effect exists as it is difficult to quantify without empirical data on conference participant behavior anyway. We stressed this assumption again also following the first peer review comment by Lina Stein on this issue.
  
  Line 95: change to :.. interdisciplinary connections, which in turn may lead to innovative…”. Also, I don’t understand the “higher productivity” here? Do you mean (even) more research papers being published? Projects funded? Please, explain.
  Answer: Thank you for pointing this vague phrasing out. We decided to remove this part from the sentence as it is very vague and more specific outcomes such as funding are not specified in the literature either.
  
  Citation: https://doi.org/10.5194/egusphere-2024-3430-AC3
RC2:
'Comment on egusphere-2024-3430', Kirsten v. Elverfeldt, 21 Mar 2025

Dear authors,
thank you for submitting your interesting manuscript to Geoscience Communication! I enjoyed reading it and think that your use of NLP to reduce the risk to be inadvertently caught in knowledge silos when participating at the EGU GAs can be very useful. I do not want to repeat the points raised in previous comments (e.g. mixing up sessions with programme groups, the inherent inter- and transdisciplinary character of some programme groups that you might have not regarded, the existence of co-organized sessions, or that your claim that attendees focus on "their "programme group only should rather be framed as hypothesis than as a fact). Instead, I will focus on your methods and data section.
I am not a NLP specialist at all, but maybe especially because of that I found your method description too intransparent. Which pre-processing steps have been applied, and how can they affect the results? Exactly which NLP-technique has been used (e.g. TF-IDF)? Do you plan to validate your results, e.g. by providing evidence in the future that users actually benefit from the NLP-produced recommendations? Furthermore, for people like me who are a layperson with respect to NLP, a flowchart of the NLP pipeline could be helpful to understand the method.
Aside from this, I think the manuscript is very clearly written and well-organised. Your figures are informative and well-designed, and I especially liked the textual cartography of the geosciences landscape.

Citation: https://doi.org/10.5194/egusphere-2024-3430-RC2
- AC2:
  'Reply on RC2', Jan Sodoge, 31 Mar 2025
  Thanks for your positive feedback. Below we provide answers to your comments and how we will address these in the updated manuscript.
  Dear authors, thank you for submitting your interesting manuscript to Geoscience Communication! I enjoyed reading it and think that your use of NLP to reduce the risk to be inadvertently caught in knowledge silos when participating at the EGU GAs can be very useful. I do not want to repeat the points raised in previous comments (e.g. mixing up sessions with programme groups, the inherent inter- and transdisciplinary character of some programme groups that you might have not regarded, the existence of co-organized sessions, or that your claim that attendees focus on "their "programme group only should rather be framed as hypothesis than as a fact). Instead, I will focus on your methods and data section.
  
  Answer:
  Thank you for your positive feedback. Regarding the issues you mention, we will address them in the updated manuscript. First, by adding the statistics for the interdisciplinary sessions to the results section (see comment by Maria-Helena Ramos). Here, we will add the number of potential presentations missed and add a note that these statistics for the interdisciplinary sessions need to be treated differently compared to the ‘traditional’ sessions/divisions. Second, adjusting the terminology for programme groups, sessions and divisions as described in the response to Lina Stein: we now differentiate between scientific divisions and sessions. We use the term scientific division to cover what you refer to here as program groups to provide a more intuitive wording for readers. In the introduction, we now define that scientific divisions and program groups can be seen as overlapping concepts/synonymous. Third, we will add a new paragraph in the results on the co-organized sessions and how they can impact a lower share of ‘missed’ presentations. Fourth, concerning the attendees' focus on “their” programme group/division, we acknowledge, in line with the other two comments, that this assumption is highly stylized (yet was required as an initial assumption for the presented analysis). As we lack the empirical evidence on participant behavior, we add to the updated manuscript that there are other potential behaviors, such as those suggested by Maria-Helena, and that our assumption here is very stylized and likely does not reflect reality. We also add that, therefore, in future research, we aim to account for and measure such conference participant behavior to provide more accurate assessments of reality.
  
  I am not a NLP specialist at all, but maybe especially because of that I found your method description too intransparent. Which pre-processing steps have been applied, and how can they affect the results? Exactly which NLP-technique has been used (e.g. TF-IDF)? Do you plan to validate your results, e.g. by providing evidence in the future that users actually benefit from the NLP-produced recommendations? Furthermore, for people like me who are a layperson with respect to NLP, a flowchart of the NLP pipeline could be helpful to understand the method.
  
  Answer:
  Thank you for bringing up this point and your perspective which helped us to think again about making the methodological procedure more explicit. To tackle this, we added an overview figure in the Appendix (adding it to the main text is not possible as only one figure is allowed). Also, we adjusted the method description in this regard. We added an initial statement on the overall procedure (“Our approach consists of 4 steps, which we outline in detail in this section: (1) collecting abstracts from EGU, (2) computing similarities between abstracts, (3) visualizing the presentations in a 2-dimensional space, and (4) running a simulation to estimate the hypothesized filter effect.”) and more detailed information on the required pre-processing steps. Concerning the validation, we added to the discussion that we plan to account for user feedback in future research which will be valuable to improve our recommendations.
  
  Citation: https://doi.org/10.5194/egusphere-2024-3430-AC2

Report abuse

Please provide a reason why you see this comment as being abusive.
You might include your name and email but you can also stay anonymous.

Please provide a reason why you see this comment as being abusive.

Please confirm reCaptcha.

Comment*

Name:

Email:

Jan Sodoge, Taís Maria Nunes Carvalho, and Mariana Madruga de Brito

Viewed

Total article views: 509 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
361	137	11	509	10	10

HTML: 361
PDF: 137
XML: 11
Total: 509
BibTeX: 10
EndNote: 10

Views and downloads (calculated since 18 Dec 2024)

Cumulative views and downloads (calculated since 18 Dec 2024)

Viewed (geographical distribution)

Total article views: 486 (including HTML, PDF, and XML) Thereof 486 with geography defined and 0 with unknown origin.

Country	#	Views	%
United States of America	1	117	24
Germany	2	66	13
United Kingdom	3	38	7
France	4	35	7
India	5	23	4


Total:	0
HTML:	0
PDF:	0
XML:	0

117

Latest update: 08 Apr 2025

Jan Sodoge

CORRESPONDING AUTHOR

jan.sodoge@ufz.de

Department of Urban and Environmental Sociology, UFZ-Helmholtz Centre for Environmental Research, 04318, Leipzig, Germany

Institute of Environmental Science and Geography, University of Potsdam, 14476, Potsdam-Golm, Germany

Taís Maria Nunes Carvalho

https://orcid.org/0000-0001-8658-9781

Department of Urban and Environmental Sociology, UFZ-Helmholtz Centre for Environmental Research, 04318, Leipzig, Germany

Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) Dresden/Leipzig, Universität Leipzig, Leipzig, Germany

Mariana Madruga de Brito

https://orcid.org/0000-0003-4191-1647

Department of Urban and Environmental Sociology, UFZ-Helmholtz Centre for Environmental Research, 04318, Leipzig, Germany

Download

Preprint (1286 KB)
Metadata XML

BibTeX
EndNote

Short summary

Thousands of geoscience abstracts are presented at the EGU General Assembly, but researchers often miss key insights by focusing on their own field. Using natural language processing (NLP), we help scientists find relevant research across disciplines. This approach breaks down boundaries, encouraging broader knowledge sharing and new interdisciplinary connections in geosciences.

Thousands of geoscience abstracts are presented at the EGU General Assembly, but researchers...