reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Simulating Human-like Daily Activities with Desire-driven Autonomy

Authors: Yiding Wang, Yuxuan Chen, Fangwei Zhong, Long Ma, Yizhou Wang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on Concordia, a text-based simulator, to demonstrate that our agent generates coherent, contextually relevant daily activities while exhibiting variability and adaptability similar to human behavior. A comparative analysis with other LLM-based agents demonstrates that our approach significantly enhances the rationality of the simulated activities.
Researcher Affiliation	Academia	Institute for Artificial Intelligence, Peking University The University of Hong Kong School of Artificial Intelligence, Beijing Normal University Academy for Advanced Interdisciplinary Studies, Peking University State Key Laboratory of General Artificial Intelligence, BIGAI Center on Frontiers of Computing Studies, School of Computer Science, Nat l Eng. Research Center of Visual Technology, Peking University
Pseudocode	No	The Desire-driven Autonomous Agent (D2A) performs four key procedures using the Value System and the Desire-driven Planner to generate the activity at: Qualitative Value Description, Activity Proposal, Activity Evaluation, and Activity Selection. These procedures are described in text but no formal pseudocode block or algorithm is provided.
Open Source Code	Yes	1Project page: https://sites.google.com/view/desire-driven-autonomy
Open Datasets	No	We conduct experiments on Concordia, a text-based simulator... The environments are based on Concordia (Vezhnevets et al., 2023). The paper focuses on simulating activities within a self-developed environment and does not provide concrete access information for a publicly available or open dataset in the traditional sense.
Dataset Splits	No	We conducted 15 trials for each agent under identical initialization settings across all experiments. The paper describes simulation trials and environment setup rather than using predefined training/test/validation splits from a static dataset.
Hardware Specification	No	For all four agent-based methods, we used LLa MA3.1-70B as the default backbone model for both the agents and the environment controller. While specific software (LLMs) are mentioned, the paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies	Yes	For all four agent-based methods, we used LLa MA3.1-70B as the default backbone model for both the agents and the environment controller. To evaluate the adaptability of our framework across different large language models, we tested it with Qwen 2.5:72B (Yang et al., 2024)...
Experiment Setup	Yes	Specifically, for the indoor, single-agent environment, we define 11 dimensions of desired value... To quantitatively track changes in the level of these desires and qualitatively translate numerical values into descriptive states, we apply a [0-10] Likert scale for each dimension... The initial value v0 d for each desire dimension is randomly selected from the range [0, 10]... In the Activity Proposal procedure... we prompt the agent to generate N candidates (also referred to as planner width, which is set to 3 in our default experimental configuration) activities...