A Hybrid Intelligence Method for Argument Mining
Authors: Michiel van der Meer, Enrico Liscio, Catholijn M. Jonker, Aske Plaat, Piek Vossen, Pradeep K. Murukannaiah
JAIR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Hy En A on three citizen feedback corpora. We find that, on the one hand, Hy En A achieves higher coverage and precision than a state-of-the-art automated method when compared to a common set of diverse opinions, justifying the need for human insight. On the other hand, Hy En A requires less human effort and does not compromise quality compared to (fully manual) expert analysis, demonstrating the benefit of combining human and artificial intelligence. |
| Researcher Affiliation | Academia | Michiel van der Meer EMAIL Leiden Institute for Advanced Computer Science (LIACS) Leiden University Enrico Liscio EMAIL Catholijn M. Jonker EMAIL Interactive Intelligence (II) Delft University of Technology Aske Plaat EMAIL Leiden Institute for Advanced Computer Science (LIACS) Leiden University Piek Vossen EMAIL Computational Linguistics & Text Mining Lab (CLTL) Vrije Universiteit Amsterdam Pradeep K. Murukannaiah EMAIL Interactive Intelligence (II) Delft University of Technology |
| Pseudocode | No | The paper describes methods and processes in detail, often with figures (e.g., Figure 2: Overview of the Hy En A method) and descriptive text, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured, code-like steps. |
| Open Source Code | Yes | We also provide our code, annotation guidelines, and experimental details in the supplementary materials (van der Meer et al., 2024a). |
| Open Datasets | Yes | Our opinion corpora are composed of citizens feedback on COVID-19 relaxation measures, a contemporary topic. The feedback was gathered in April and May 2020 using the Participatory Value Evaluation (PVE) method (Mouter et al., 2021). ... Since we use data from a publicly run citizen feedback experiment, we observe that some options attracted more pro comments than others. |
| Dataset Splits | No | In the first phase of Hy En A, human annotators extract individual key argument lists by analyzing the opinion corpus. ... In each corpus, five annotators annotated 51 opinions each, for a total of 255 opinions per corpus. Of the 51 opinions, the first is selected randomly, and the following 50 are selected by FFT. This number of opinions was empirically selected to make the annotation feasible within a maximum of one hour. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used for running its experiments or models. It mentions using various models (S-BERT, BERTopic, ChatGPT, Llama) but not the underlying hardware. |
| Software Dependencies | No | The paper mentions using S-BERT (Reimers & Gurevych, 2019), Huggingface Model Hub, BERTopic (Grootendorst, 2022), Microsoft Azure Translation service, Chat GPT (Ouyang et al., 2022), and Llama (Touvron et al., 2023). However, it does not specify explicit version numbers for these software components or any other libraries used for the implementation. |
| Experiment Setup | Yes | In the first phase of Hy En A, each annotator extracts a key arguments list from an opinion corpus. In each corpus, five annotators annotated 51 opinions each, for a total of 255 opinions per corpus. Of the 51 opinions, the first is selected randomly, and the following 50 are selected by FFT. ... We instantiate the S-BERT model MS using the Huggingface Model Hub1. ... We train a BERTopic model on each opinion corpus, generating 59, 56, and 72 topics for the young, immune, and reopen corpora, respectively. ... We experiment with two well-known graph clustering algorithms: (1) Louvain clustering (Blondel et al., 2008) uses network modularity to identify groups of vertices based on a resolution parameter r. (2) Self-tuning spectral clustering (Zelnik-Manor & Perona, 2004) uses dimensionality reduction in combination with k-means to obtain clusters, where k is the desired number of clusters. We select the parameters of these algorithms to minimize the error metric E shown in Eq. 3. ... Prompt 1: Chat GPT Consider the context of the COVID-19 pandemic and the following arguments: Argument 1 ... Argument k Write a key argument that summarizes the above arguments, and make it short and concise. Prompt 2: Llama Consider the context of the COVID-19 pandemic and the following arguments: Argument 1 ... Argument k A short and concise key argument that summarizes the above arguments is: |