reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SocialSim: Towards Socialized Simulation of Emotional Support Conversation

Authors: Zhuang Chen, Yaru Cao, Guanqun Bi, Jincenzi Wu, Jinfeng Zhou, Xiyao Xiao, Si Chen, Hongning Wang, Minlie Huang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We further train a chatbot on SSConv and demonstrate its state-of-the-art performance in both automatic and human evaluations. To verify the effectiveness of the proposed Social Sim framework for ESC simulation, we conduct comparative experiments between the generated SSConv corpus and existing datasets...
Researcher Affiliation	Collaboration	1School of Computer Science and Engineering, Central South University 2Co AI Group, DCST, IAI, BNRIST, Tsinghua University 3Northwest Minzu University 4The Chinese University of Hong Kong 5Lingxin AI 6Academy of Arts & Design, Tsinghua University
Pseudocode	No	The paper describes methods and processes in narrative text, such as the design of cognitive reasoning and the sequential traversal of reasoning nodes, but does not present them in structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code, nor does it provide any links to a code repository.
Open Datasets	Yes	We select Psy QA (Sun et al. 2021), a Chinese psychological health support dataset in a Q&A format. ... Sun, H.; Lin, Z.; Zheng, C.; Liu, S.; and Huang, M. 2021. Psy QA: A Chinese Dataset for Generating Long Counseling Text for Mental Health Support. Ar Xiv, abs/2106.01702.
Dataset Splits	Yes	We select two test sets: SSConv-test is split from SSConv with the ratio train:test=9:1, representing a diverse range of helpseeking scenarios across various topics. ESConv-test consists of 200 held-out dialogues from ESConv, validating the model s performance on in-domain scenarios with more focused and limited topics.
Hardware Specification	Yes	The training is conducted... on one Tesla V100 GPUs.
Software Dependencies	No	The paper mentions using Llama-2-7b as the backbone language model and AdamW optimizer with LoRA, but it does not specify version numbers for any software libraries, frameworks, or programming languages used for implementation (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The training is conducted for 5 epochs using the Adam W optimizer (Loshchilov and Hutter 2019) with Lo RA (Hu et al. 2021), a learning rate of 5e-5, and a batch size of 8 on one Tesla V100 GPUs.