reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Inverse decision-making using neural amortized Bayesian actors

Authors: Dominik Straub, Tobias Fabian Niehues, Jan Peters, Constantin Rothkopf

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show on synthetic data that the inferred posterior distributions are in close alignment with those obtained using analytical solutions where they exist. Where no analytical solution is available, we recover posterior distributions close to the ground truth. We then show how our method allows for principled model comparison and how it can be used to disentangle factors that may lead to unidentifiabilities between priors and costs. Finally, we apply our method to empirical data from three sensorimotor tasks and compare model fits with different cost functions to show that it can explain individuals behavioral patterns.
Researcher Affiliation	Academia	1 Institute of Psychology, Technical University of Darmstadt 2 Centre for Cognitive Science, Technical University of Darmstadt 3 Department of Computer Science, Technical University of Darmstadt & Hessian Center for Artificial Intelligence {firstname.lastname}@tu-darmstadt.de
Pseudocode	Yes	A ALGORITHM Algorithm 1 Train neural network to approximate a Bayesian actor model ... Algorithm 2 Bayesian inference about the parameters of a Bayesian actor model
Open Source Code	Yes	Our implementation is publicly available at https://github.com/Rothkopf Lab/ naba. Our software package enables the user to define new parametric families of cost functions and train neural networks to approximate the decision-making problem and perform Bayesian inference about its parameters.
Open Datasets	Yes	Finally, we apply our method to human behavioral data from three different experiments and show that the inferred cost functions explain the previously mentioned typical behavioral patterns not only in synthetically generated but also empirically observed data. ... The tasks used in our evaluation are a bean bag (BB) throwing task (Willey & Liu, 2018), a puck sliding (PU) task (Neupärtl et al., 2020) and a force reproduction (FOR) task (Onneweer et al., 2016).
Dataset Splits	No	The paper describes generating synthetic datasets of specific sizes (e.g., 'We generated a dataset of 60 pairs of stimuli si and responses ri', 'simulated a dataset consisting of 60 trials', 'simulated 45 trials for each level'). For evaluating convergence of the neural network approximator, it mentions 'an evaluation dataset consisting of 100,000 parameter sets'. However, it does not specify explicit training/test/validation splits for the empirical data or for evaluating the primary model inference method's generalization performance on these datasets.
Hardware Specification	No	We gratefully acknowledge the computing time provided to us on the high-performance computer Lichtenberg at the NHR Centers NHR4CES at TU Darmstadt.
Software Dependencies	No	The method was implemented in jax (Frostig et al., 2018), using the packages equinox (Kidger & Garcia, 2021) for neural networks and numpyro (Phan et al., 2019) for probabilistic modeling and inference.
Experiment Setup	Yes	We used the RMSProp optimizer with a learning rate of 10 4, batch size of 256, and N = M = 128 Monte Carlo samples per evaluation of the stochastic training objective. The networks were trained for 500,000 steps... We used a multi-layer perceptron with 4 hidden layers and 16, 64, 16, 8 nodes in the hidden layers, respectively. We used swish activation functions at the hidden layers... To sample from the researcher s posterior distribution over the subject s model parameters p(θ \| D), we use the Hamiltonian Monte Carlo algorithm NUTS (Hoffman et al., 2014)... We drew 20,000 samples from the posterior distribution in 4 chains, each with 5,000 warmup steps.