reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards shutdownable agents via stochastic choice

Authors: Elliott Thornley, Alexander Roman, Christos Ziakas, Louis Thomson, Leyton Ho

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we propose evaluation metrics for usefulness and neutrality. We use a DRe ST reward function to train simple agents to navigate gridworlds, and we find that these agents learn to be useful and neutral. Our results thus provide some initial evidence that DRe ST reward functions could train advanced agents to be useful and neutral.
Researcher Affiliation	Academia	Elliott Thornley Massachusetts Institute of Technology EMAIL Alexander Roman New College of Florida EMAIL Christos Ziakas Imperial College London EMAIL Leyton Ho Brown University Louis Thomson Independent
Pseudocode	No	The paper describes the DRe ST reward function and references the REINFORCE algorithm but does not present any formal pseudocode blocks or algorithms in a structured format within the text or appendices.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository or mention code in supplementary materials.
Open Datasets	No	The paper utilizes custom-designed "gridworlds containing coins and a shutdown-delay button" as environments for its experiments. Figure 2 shows an "Example gridworld," and Appendix E shows "a varied collection of eight gridworlds." These are custom environments/datasets, but no public access information (link, DOI, citation) is provided for them.
Dataset Splits	No	The paper describes the training process for reinforcement learning agents, detailing that "We train our DRe ST agents with 64 mini-episodes in each of 2,048 meta-episodes, for a total of 131,072 mini-episodes." It also states, "We train 10 agents using the default reward function (blue) and 10 agents using the DRe ST reward function (orange)." This refers to the number of episodes and agents trained, not specific training/validation/test splits of a static dataset, which is common in supervised learning but not typically applicable in this form for reinforcement learning environments.
Hardware Specification	No	The paper does not provide any specific details regarding the hardware (e.g., GPU models, CPU types, memory) used for conducting the experiments.
Software Dependencies	No	The paper mentions using "a tabular version of the REINFORCE algorithm (Williams, 1992)" but does not specify any other software libraries, frameworks, or their version numbers necessary for replication.
Experiment Setup	Yes	We train our DRe ST agents with 64 mini-episodes in each of 2,048 meta-episodes, for a total of 131,072 mini-episodes. We choose λ = 0.9 for the base of the DRe ST discount factor, and γ = 0.95 for the temporal discount factor. We exponentially decay the learning rate from 0.25 to 0.01 over the course of 65,536 mini-episodes. We use an ϵ-greedy policy to avoid entropy collapse, and exponentially decay ϵ from 0.5 to 0.001 over the course of 65,536 mini-episodes.