reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning to Be Cautious

Authors: Montaser Mohammedalamen, Dustin Morrill, Alexander Sieusahai, yash satsangi, Michael Bowling

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we present both a sequence of tasks where cautious behavior becomes increasingly non-obvious, as well as an algorithm to demonstrate that it is possible for a system to learn to be cautious. ... Experiments and results are shown in Section 4, and our theoretical contribution extending kof-N CFR to continuing MDPs is provided in Section 5.
Researcher Affiliation	Collaboration	Montaser Mohammedalamen EMAIL University of Alberta; Alberta Machine Intelligence Institute (Amii) Dustin Morrill EMAIL Sony AI Alexander Sieusahai EMAIL University of Alberta Yash Satsangi EMAIL Independent Researcher* Michael Bowling EMAIL University of Alberta; Alberta Machine Intelligence Institute (Amii)
Pseudocode	Yes	Algorithm 1 Learning to Be Cautious ... Algorithm 2 Learning to Be Cautious
Open Source Code	Yes	Our code is available at https://github.com/montaserFath/Learning-to-be-Cautious.
Open Datasets	Yes	The images are hand-drawn digits from MNIST (Le Cun et al., 1998)... a shoe image from MNIST fashion (Xiao et al., 2017)... or a letter from EMNIST (Cohen et al., 2017)
Dataset Splits	Yes	The familiar states are the 60K training images in the MNIST digit dataset... We construct novel MDPs from the MNIST fashion test set (Xiao et al., 2017) and EMNIST letters test set (Cohen et al., 2017)... training data are [1%, 10%, 100%] of the full digit dataset.
Hardware Specification	Yes	For all MNIST experiments, we used an NVIDIA Tesla V100 GPU and a 2.2 GHz Intel Xeon CPU with 100 GB memory. ... This experiment was run on a 3.60GHz Intel Core i9-9900K CPU with 7.7 GB of memory without a GPU.
Software Dependencies	No	Py Torch (Paszke et al., 2019) is used to build and train all neural networks. ... Adam optimizer (Kingma & Ba, 2015)...
Experiment Setup	Yes	Table 1: The batch size and number of epochs used to train the neural network reward models for each setting in the how caution depends on the extent of training data experiment. ... Networks are trained to minimize the mean-squared error (MSE) between reward predictions and target rewards with the Adam optimizer (Kingma & Ba, 2015) using a learning rate of 0.0016 (we also try 0.01, and 0.001). The remaining parameters for Adam in Py Torch (β1, β2, ϵ, and weight decay) are set to their defaults (0.9, 0.999, 10 8, 0)... We use a discount factor of γ = 0.99.