EAVIT: Efficient and Accurate Human Value Identification From Text Data via LLMs

Authors: Wenhao Zhu, Yuhang Xie, Guojie Song, Xin Zhang

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experiments 5.1 Value Identification on Public Datasets Datasets and Methods We conducted experiments on three public and manuallylabelled datasets: Value Net (Augmented) [Qiu et al., 2022], Webis-Arg Values-22 [Kiesel et al., 2022], and Touch e23Value Eval [Kiesel et al., 2023]. ... For all datasets, we report the accuracy and the officially recommended F1-score on the validation and test data.
Researcher Affiliation Academia Wenhao Zhu1 , Yuhang Xie1 , Guojie Song 1 and Xin Zhang2 1State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University 2School of Psychological and Cognitive Sciences, Peking University EMAIL, EMAIL
Pseudocode No The paper describes the three stages of the EAVIT method: (1) Training value detector; (2) Generating candidate value set; (3) Final value identification using LLMs, and provides prompt templates. However, it does not include structured pseudocode or algorithm blocks with numbered steps.
Open Source Code No The paper does not contain any explicit statements about releasing source code for the methodology, nor does it provide a link to a code repository. It only refers to an extended version of the paper on arXiv.
Open Datasets Yes We conducted experiments on three public and manuallylabelled datasets: Value Net (Augmented) [Qiu et al., 2022], Webis-Arg Values-22 [Kiesel et al., 2022], and Touch e23Value Eval [Kiesel et al., 2023]. ... Our experiments will use these public, human-annotated datasets as the basis for training and validation.
Dataset Splits No The paper mentions using 'validation and test data' and refers to the 'original Touch e23-Value Eval train dataset', but it does not provide specific details on the dataset splits, such as exact percentages, sample counts, or the methodology used for splitting, within the provided text. It states 'Details can be found in Appendix', which is not included.
Hardware Specification Yes With QLo RA [Dettmers et al., 2023; Hu et al., 2021], finetuning Llama2-13b-chat can be executed on 4 Nvidia RTX 4090 GPUs with 24GB VRAM.
Software Dependencies No The paper mentions specific language models used (Llama2-13b-chat, GPT-4o-mini, GPT-4o, GPT-4) and techniques like QLoRA and Alpaca format. However, it does not provide specific version numbers for software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages (e.g., Python version) used for implementation.
Experiment Setup Yes In Section 4.2, it specifies parameters for candidate set generation: 'Usually we set L = 5 to achieve the balance of reducing randomness and efficiency. Next, we set two thresholds 0 < plow < phigh < 1.' In Section 5.1, it further clarifies: 'For EAVIT, we set plow = 0.2, phigh = 0.8 and report the results of the value detector the entire method.' It also mentions reporting 'the average and std of 3 random individual runs'.