the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
GC Insights: Breaking the silos – leveraging NLP to encourage interdisciplinary interaction at the EGU
Abstract. Thousands of abstracts from various geoscience sub-fields are presented annually at the EGU General Assembly (GA), offering a rich resource for tracking scientific progress. However, rigid session groupings can limit cross-disciplinary exploration. Here, we show that participants focusing only on their broad disciplinary session miss an average of 44 % of the 10 most relevant contributions. To break this compartmentalization, we propose using natural language processing (NLP), enabling the geoscience community to explore the full breadth of knowledge beyond traditional disciplinary boundaries.
- Preprint
(1286 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (extended)
-
RC1: 'Comment on egusphere-2024-3430', Lina Stein, 17 Jan 2025
reply
The authors present an analysis of all abstracts submitted to the EGU General Assembly in the last five years. The abstracts are clustered according to semantic similarity of the abstract text. For each scientific division, the distribution of abstracts across the cluster space is evaluated to demonstrate that similar research is submitted across different scientific divisions within the EGU GA. The authors claim, that since attendees primarily attend sessions from their own scientific division, this means attendees might miss relevant research (bubble effect). I found the overview of abstract similarity across scientific divisions very informative. It demonstrates that the scientific divisions have more overlap than the strict separation necessary for conference organization might suggest.
Additionally, the authors propose a webtool, which can recommend EGU GA submissions based on similarity to a search string. I have actually used the prototype during the 2024 EGU and found it immensely helpful.
Overall, I find this article timely, relevant, and well-written. I only have a few minor suggestions for improvement.
Specific comments:
Your main assumption is that people will not attend presentations of research similar to their own submission due to it being in another scientific division. E.g. Line 10 “We hypothesize that this compartmentalization may inadvertently create knowledge silos, as EGU GA attendants tend to focus on their own scientific divisions, potentially missing relevant developments from other disciplines.” From my own (probably biased) experience I would say that cross-division attendance does take place a lot during EGU. And the EGU website does already offer a search tool that allows people to find abstracts across scientific divisions according to keyword searches. However, text similarity offers a much better tool to identify presentations of interest. Simple keyword searches often still produce a large number of results. While it is, of course, impossible to quantify which scientific division sessions are attended by whom during the GA, it would help your claim to offer some more numbers to demonstrate the infeasibility of manually checking all sections for relevant contributions. How many abstracts are submitted per section? How many sessions are registered per section?
On thing I found irritating is that the 22 scientific divisions or scientific sections are referred to as sessions. There are multiple sessions organised within each scientific division. But the authors split their data into scientific divisions and called them session. That the selection on the EGU GA website offers “Disciplinary sessions” is not a name, but means that they are the session related to one specific division.
Please clarify in the abstract and text what you mean by participant exploration or participant focus. It should be stated early on and more clearly, that you refer to choice of session/division attendance during the GA. (And not, for example, choice of session during the abstract submission process).
Lastly, it would be nice to add the number of people who have used the webtool to find relevant sessions during EGU24.
Citation: https://doi.org/10.5194/egusphere-2024-3430-RC1 -
AC1: 'Reply on RC1', Jan Sodoge, 27 Jan 2025
reply
Thanks for your positive feedback. Below we provide answers to your comments and how we will address these in the updated manuscript.
Comment: Your main assumption is that people will not attend presentations of research similar to their own submission due to it being in another scientific division. E.g. Line 10 “We hypothesize that this compartmentalization may inadvertently create knowledge silos, as EGU GA attendants tend to focus on their own scientific divisions, potentially missing relevant developments from other disciplines.” From my own (probably biased) experience I would say that cross-division attendance does take place a lot during EGU. And the EGU website does already offer a search tool that allows people to find abstracts across scientific divisions according to keyword searches. However, text similarity offers a much better tool to identify presentations of interest. Simple keyword searches often still produce a large number of results. While it is, of course, impossible to quantify which scientific division sessions are attended by whom during the GA, it would help your claim to offer some more numbers to demonstrate the infeasibility of manually checking all sections for relevant contributions. How many abstracts are submitted per section? How many sessions are registered per section?
Answer: This is a good suggestion. Indeed, as you point out our initial assumption is very stylized- which might not reflect reality at EGU GA. In the updated manuscript, we added to this hypothesis that it is likely not reflecting reality yet that such simplification is necessary for our assessment. This is a limitation of our study that requires clearer acknowledgment. In fact, for the next EGU, we plan to survey individual users' attributes (e.g., which division(s) does one belong to) to assess the quality of recommendations and select the most appropriate model. Regarding the current manuscript, we agree and will implement your suggestion of adding some numbers on the volume of abstracts to emphasize the 'infeasibility of manually checking'.
Comment: On thing I found irritating is that the 22 scientific divisions or scientific sections are referred to as sessions. There are multiple sessions organised within each scientific division. But the authors split their data into scientific divisions and called them session. That the selection on the EGU GA website offers “Disciplinary sessions” is not a name, but means that they are the session related to one specific division.
Please clarify in the abstract and text what you mean by participant exploration or participant focus. It should be stated early on and more clearly, that you refer to choice of session/division attendance during the GA. (And not, for example, choice of session during the abstract submission process).Answer: We agree with your criticism of the unclear terminology. We used the disciplinary sessions throughout the manuscript as it is the name on the GA website but understand that "scientific division" is a much clearer terminology. As such, we revised the entire manuscript and clarified in the text that we use the term "scientific divisions" to refer to what sometimes is called disciplinary sessions.
Comment: Lastly, it would be nice to add the number of people who have used the webtool to find relevant sessions during EGU24.
Answer: Sadly, we have no clear evidence of the number of users based on the platform and subscription we used for hosting this app (shinyapps.io). The only metric we could quantify is 45 hours of usage time. However, this does not correspond to a particular number of users. For the upcoming EGU25 though, we plan to quantify such metrics more in-depth by hosting the application via our computational resources (see also the first comment on our ambitions to quantify more aspects to improve the evaluation).
On behalf of the authors,
Jan SodogeCitation: https://doi.org/10.5194/egusphere-2024-3430-AC1
-
AC1: 'Reply on RC1', Jan Sodoge, 27 Jan 2025
reply
-
CC1: 'Comment on egusphere-2024-3430, by Maria-Helena Ramos', Maria-Helena Ramos, 14 Mar 2025
reply
Dear authors,
I enjoyed reading the paper and thank you for the initiative and study, which is very interesting. As current EGU Programme Committee Co-chair for the preparation of the General Assembly, I would like to list some suggestions below, which I hope may clarify some points and improve your manuscript.
General comments:
- Scientific sessions in the EGU GA are structured around disciplinary Programme Groups (PG), complemented by the EOS (Education and Outreach Sessions) programme group and the ITS (Inter- and Transdisciplinary Sessions) programme group, as they are called. Below I indicated where I think this organization should be clarified to avoid misunderstanding with the terminology. In several instances, I believe that “session” is misused in the place of “programme group”. They are indicated in details below.
- While I understand the choice of not considering sessions in the ITS programme group (lines 33-36), I think some clarification may be given also in relation to the fact that some “disciplinary” programme groups are, in fact, highly “inter- or transdisciplinary” too. This is, to a high degree, notably the case of NH (Natural Hazards) and Earth and Space Science Informatics (ESSI), for instance, and, maybe to a lower degree, to some other disciplinary programme groups. How could this issue affect your analysis and results?
- Also, I was wondering how co-organization of sessions within two or more programme groups were considered in the analysis, and how this might affect the results. Have you selected abstracts only from sessions that are not co-organized by other PGs? (session co-organizations are marked as such in the EGU GA programme: “CR3.3 - Advances in sea-ice modelling: developments and new techniques. Co-organized by NP1/OS1”. In this example, the session is led by the CR programme group, but since it is co-organized it is also displayed in the NP and the OS programme groups). Programme groups have different percentage of co-organized sessions (over the total number of led sessions), which can vary significantly.
- When looking at Fig. 1, I was also wondering how much the very different sizes of the PGs may affect the results. The EGU programme has PGs with 2,000+ abstracts in 200+ sessions, as well as PGs with 200+ abstracts in 10+ sessions. Would this affect the analysis and results? How/why not?
- If I understand well, the paper suggests that using an automatic k-clustering approach could lead to more topically-similar programme groups at the EGU GA. While this might be a good idea, I think one has to consider also the “errors” of the text similarity approach. Maybe adding some comments on that in the Discussion section would be important here. Also, one has to consider that, currently, mostly it is up to the authors to make the decision of where they want to submit their abstract to. From my experience, I believe this choice depends on the disciplines, but also on the community around a given session (conveners team, authors of the same session in the past year(s), etc.). Under a clustering approach as proposed, this might be lost, if I understand well, and the system will make this choice for the author. In your opinion, what would be the obstacles/limitations here?
- In the Discussion section, it could be interesting to add that the authors are currently working with the EGU Programme Committee and Copernicus Conference Manager to experiment with additional tools that could be integrated to the conference system and improve the experience attendees and authors may have in future EGU General Assemblies. These experiments can be useful to validate the NLP algorithms in real case applications.
Specific comments:
General suggestion: replace “attendants” to “attendees”
Line 7: EGU General Assembly gathers over 20,000 participants worldwide. I suggest to update the numbers here.
Lines 8-9: change to: “…the EGU GA is currently structured into 22 disciplinary programme groups (PGs) and further sessions included in the ITS (Inter- and Transdisciplinary Sessions) programme group and the EOS (Education and Outreach Sessions) programme group, among other Union-wide sessions. We hypothesize…”
Line 10: change to: “…as EGU GA attendees might tend to focus on the disciplinary PG related to their own scientific discipline, potentially…” => I believe that “might” should be used here as, to my knowledge, so far there is no study that has put this into evidence.
Line 22: change to: “… the compartmentalization at the EGU GA programme presentation may…” => since there are instruments for co-organizations as well as other Union-wide sessions in the EGU GA programme, I believe that a precision must be added here to say that it refers to the EGU GA programme presentation (not to the EGU GA as a whole).
Line 30: do you mean “corresponding session identification (session ID)”? I think “session data” is a bit unclear.
Line 31: change “presented in 22 disciplinary sessions” to “presented in the 22 disciplinary programme groups”, and Line 31-32: change to “disciplinary programme groups” (or “disciplinary PGs”) => I think you are referring to PGs, rather than sessions. “Sessions” are the individual sessions inside the PGs, and there are more than 22 of those at EGU GA (usually, there are 1,000+ sessions in an EGU GA).
Line 32: change to: “…may vary significantly…”
Line 33: suggestion to change to: “… identify their research as “interdisciplinary”,
Line 34: I do not fully understand this sentence: could you clarify? I am not sure that “rarely align with ITS sessions” may be fully correct. The ITS sessions are very popular among authors, and the number of abstract submissions to the ITS PG has well increased over the years. Also, it is not clear to me what you mean by “tend to be… narrowly focused”, since the nature itself of the sessions in the ITS PG is to be broad, involving one or more disciplines.
Line 35: change to: “… concentrating only on abstracts submitted to sessions within disciplinary programme groups provides…”
Lines 50-53: I got a bit confused here: do you mean “sessions” or “programme groups”? I have the impression that you mean “programme groups” (at list on lines 52 and 53); please check (see also my general comment above)
Lines 54-55: change to: “…to evaluate how well the grouping of the EGU GA abstracts in the 22 PGs compares with the clustering using the k-means clustering algorithm, considering the same number of clusters as the number of disciplinary PGs (n=22).” Please, pay attention to the fact that there may be a terminology-based confusion here also between "sessions" and “programme groups”.
Line 60: suggestion to change to “…landscape of the EGU GA abstract sample analysed here, …”
Line 61: change “disciplinary session” to “disciplinary PG”
Line 62: change “… and climate (CL) session…” to “… and climate (CL) programme groups...”
Line 64: change “disciplinary sessions” to “disciplinary PGs”
Lines 64-65: suggestion to change to “… to a research abstract on modelling… include an abstract on evaluating…”
Lines 67: I think here also you mean “presented in different programme groups”, not different “sessions”; is that so? Or maybe “presented in different sessions, belonging to different programme groups” (?)
Line 67: Maybe complete with the following: “… would potentially be missed if an attendee only consults one programme group and the sessions are not co-organized”. => I think this is important to mention since consulting other PGs when preparing a “personal programme” and displaying co-organized sessions in all the programmes of the PGs involved in the co-organization (as done in the EGU GA programme) would minimize this effect.
Line 68: change to “disciplinary programme groups”
Line 69: also here, do you mean “within the programme group where their abstract was submitted to potentially…”? Instead of “session”?
Line 70: Suggestion: I would rather say “…. of the 10 contributions most similar to their abstract”.
Line 70 and Line 72: change “session” to “programme group” in the three occurrences in these lines. AS and ST, as well as GI and NP, are not sessions, but “programme groups”.
Lines 73-74: I think the sentence is a bit confusing (I could not grasp the message/meaning). Maybe rephrase it? Sentence: “…reduce the share of relevant contributions covered by the own “session=> programme group”
Line 75: change to “using disciplinary programme groups”
Line 77-78: I am bit confused here: do you mean to group “similar abstracts”? I do not understand the term “group sessions”. Sessions are grouped in programme groups. Maybe you mean: “to build programme groups that instead of focusing on a discipline would bring together statistically similar abstracts”.
Line 81: the sentence may be missing something: to keep track of what exactly? Have you consulted participants on that? And organizers (what type of organizers? Conveners? Programme Group chairs?). I think it might be useful to provide more details here.
Line 82: replace by “last five General Assemblies”
Line 84: suggestion: “the participant identifies most with”. Having said that, I believe that the issue is not only “which discipline a participant identifies with”, but how participants search the programme of the GA to prepare their own personal programme: do they stay in the same programme group or do they navigate through the other programme groups as well? As the number of abstracts increase, I agree that searching through the programme, inside the programme groups and their respective sessions, might become a laborious task tough!
Line 85: suggestion: “the possible presence of a bubble effect” => to me, a bubble effect in itself needs more than a disciplinary-based programme display organization; it requires also that participants do not look outside their disciplinary programme group, for instance.
Line 86: replace “sessions” to “programme groups” or “disciplines”
Line 87: could it be related to the fact that in EGU GA NP and GI are also smaller programme groups, comparatively to PGs such as AS and HS, for instance?
Line 88: replace “sessions” to “programme groups”
Line 94: replace “session” to “programme group”
Line 95: change to :.. interdisciplinary connections, which in turn may lead to innovative…”. Also, I don’t understand the “higher productivity” here? Do you mean (even) more research papers being published? Projects funded? Please, explain.
Lines 96-97: change “EGU GA 2024” to “EGU24 General Assembly”.
Line 98: replace “sessions” to “programme groups”
Line 100: replace “sessions” to “programme groups”
Figure 1 caption: change “session” to “programme group” in the four occurrences.
Figure 1A: I think it could be slightly re-worded to avoid potential misunderstandings. Suggestion: “…
Thank you again for the work done and the contribution to making EGU General Assemblies an even better and exciting event!
Citation: https://doi.org/10.5194/egusphere-2024-3430-CC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
312 | 116 | 8 | 436 | 8 | 8 |
- HTML: 312
- PDF: 116
- XML: 8
- Total: 436
- BibTeX: 8
- EndNote: 8
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1