reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SS-GEN: A Social Story Generation Framework with Large Language Models

Authors: Yi Feng, Mingyang Song, Jiaqi Wang, Zhuang Chen, Guanqun Bi, Minlie Huang, Liping Jing, Jian Yu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results show that our dataset significantly improves LLMs performance on SS-GEN, at lower costs and with simpler instructions. In this section, we conduct extensive experiments to evaluate the performance of smaller, cost-effective language models which are trained and tested using our curated dataset for SS-GEN, considering that powerful closed-source LLMs require very complex instructions and expensive API fees.
Researcher Affiliation	Collaboration	1Beijing Key Laboratory of Traffic Data Mining and Embodied Intelligence, Beijing Jiaotong University 2Tencent Hunyuan, Beijing 3Jarvis Research Center, Tencent You Tu Lab 4 School of Computer Science and Engineering, Central South University 5Co AI Group, DCST, IAI, BNRIST, Tsinghua University
Pseudocode	No	The paper describes the STARSOW strategy with steps like "Taking root", "Branching Out", "Bearing Star Fruits", and "Gardening Work", but these are presented as descriptive paragraphs rather than structured pseudocode or an algorithm block.
Open Source Code	Yes	The code, prompt, data and technical appendix are available at https://github.com/MIMIFY/SS-GEN
Open Datasets	Yes	We utilize the Social Story dataset constructed through STARSOW for SS-GEN. The code, prompt, data and technical appendix are available at https://github.com/MIMIFY/SS-GEN
Dataset Splits	Yes	We then divide the data into training, validation, and testing sets with a ratio of 8:1:1.
Hardware Specification	Yes	We utilize the Parameter-Efficient Fine-Tuning (PEFT) strategy, integrated with Low-Rank Adaption (Lo RA) using the LLa MA Factory (Zheng et al. 2024), to test and fine-tune these models on four NVIDIA Ge Force RTX 4090 GPUs.
Software Dependencies	No	The paper mentions using "LLa MA Factory" but does not provide a specific version number. It also lists several LLM models (Mistral, Gemma, LLama3) and mentions a "Berkeley Neural Parser" but no version numbers are provided for any of these software components.
Experiment Setup	Yes	We utilize the Parameter-Efficient Fine-Tuning (PEFT) strategy, integrated with Low-Rank Adaption (Lo RA) using the LLa MA Factory (Zheng et al. 2024), to test and fine-tune these models on four NVIDIA Ge Force RTX 4090 GPUs. Besides, we utilize the same precise title-to-story prompt as illustrated in Figure 5 for both training and testing. This simple prompt is designed to enhance the model s capacity to construct a Social Story from the provided title (the intervention goal of the story). The prompt itself is: Develop a concise, clear, straightforward, positive and supportive Social Story titled {title} for children and teens with autism, 200-300 words, that promotes their social understanding and boosts their participation in daily activities, fostering independence and confidence. Title: {title}