reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Affordable Generative Agents

Authors: Yangbin Yu, Qin Zhang, junyou li, QIANG FU, Deheng Ye

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on multiple environments show the effectiveness and efficiency of our proposed framework. Also, we delve into the mechanisms of emergent believable behaviors lying in LLM agents, demonstrating that agents can only generate finite behaviors in fixed environments, based upon which, we understand ways to facilitate emergent interaction behaviors. Our code is publicly available at: https: //github.com/Affordable Generative Agents/Affordable-Generative-Agents. We propose several evaluation methods and conduct extensive experiments in benchmarking environments to validate the effectiveness of our framework.
Researcher Affiliation	Industry	Yangbin Yu EMAIL Tencent Qin Zhang EMAIL Tencent Junyou Li EMAIL Tencent Qiang Fu EMAIL Tencent Deheng Ye EMAIL Tencent
Pseudocode	No	The paper does not contain any explicit sections or figures labeled as 'Pseudocode' or 'Algorithm'. While prompt templates are provided in the appendix, they are not structured as pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is publicly available at: https: //github.com/Affordable Generative Agents/Affordable-Generative-Agents.
Open Datasets	Yes	To examine the applicability of our techniques, we have conducted extensive experiments using well-known environments, including the Stanford Town (Park et al., 2023) and the Virtual Home (Puig et al., 2018), to demonstrate that, while achieving the same performance, the consumption of generating believable agent behaviors can be significantly reduced.
Dataset Splits	No	The paper describes conducting experiments in simulation environments, such as "multiple two game day simulations of 3-person town" and "25-person town" within the Generative Agents framework, and agents experiencing "a full day at home" in Virtual Home. It focuses on interactions and emergent behaviors within these simulations rather than using predefined training/test/validation splits from a static dataset, thus no such splits are provided.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or cloud computing instance specifications used for running the experiments.
Software Dependencies	No	The paper mentions that all agents are empowered with "GPT-3.5-Turbo (Wu et al., 2023)" and "GPT-4 (Achiam et al., 2023)", and uses "text-embedding-ada-002". However, these are specific large language models and an embedding model, not software libraries or solvers with specific version numbers (e.g., Python 3.8, PyTorch 1.9) required to reproduce the environment setup.
Experiment Setup	Yes	In our implementation, we set the threshold at 0.97. In Social Memory, all initial relationships between agents are set to Unknown and subsequently updated following each interaction. We conduct interviews with agents to evaluate their ability to remember past events, strategize for future tasks based on those memories, react appropriately to unexpected situations, and reflect on their previous actions to improve their future performance. We use the same questions as Generative Agents to inquire about five aspects: self-knowledge, memory, plans, reactions and reflections. The questions and answers are shown in the appendix C. Appendix D.1 and D.2 provide detailed prompt templates and criteria for evaluating activities and dialogues using GPT-4. Appendix F describes the implementation of 'mind wandering', including a weighted sampling formula based on DBSCAN for event selection.