reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Quantifying the Self-Interest Level of Markov Social Dilemmas

Authors: Richard Willis, Yali Du, Joel Z. Leibo, Michael Luck

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate our method on three environments from the Melting Pot suite... Our results illustrate how reward exchange can enable agents to transition from selfish to collective equilibria... This paper presents a novel method for empirically estimating the self-interest level of Markov game representations of social dilemmas using multi-agent reinforcement learning (MARL). Our primary contributions are twofold: we present a novel quantitative method for determining the self-interest level... and we provide more comprehensive experimental results on three environments featuring larger numbers of agents from the Melting Pot suite [Leibo et al., 2021].
Researcher Affiliation	Collaboration	1King s College London 2Google Deep Mind 3University of Sussex EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes its methods and procedures in narrative form within Section 4 ('Method') and its subsections, without using structured pseudocode or algorithm blocks.
Open Source Code	Yes	See https://github.com/ willis-richard/meltingpot/tree/markov sd for further details.
Open Datasets	Yes	We evaluate our approach using three environments from the Melting Pot suite [Leibo et al., 2021]: Commons Harvest, Clean Up, and Externality Mushrooms1.
Dataset Splits	No	The paper describes using specific environments from the Melting Pot suite for experiments and specifies episode length and total training steps (e.g., 'episode length to 2000 timesteps', 'train for 9000 episodes (18 million environment steps)'), but does not provide explicit training/validation/test dataset splits with percentages, sample counts, or specific files as typically defined for static datasets.
Hardware Specification	No	The paper mentions 'Compute resources were provided by King’s College London [King’s College London e-Research team, 2024]' in the acknowledgments, but does not specify any particular GPU models, CPU models, memory configurations, or other detailed hardware specifications used for running the experiments.
Software Dependencies	No	The paper mentions using 'Proximal Policy Optimisation (PPO)' as the learning algorithm, but it does not provide specific version numbers for PPO or any other software libraries, frameworks (e.g., TensorFlow, PyTorch), or programming languages used in the implementation.
Experiment Setup	Yes	For all environments, we fix the episode length to 2000 timesteps, and we modify the observation space by compressing each grid cell from 8x8 pixels to a single pixel... For our experiments, we use five random seeds and train for 9000 episodes (18 million environment steps) at each stage of the curriculum. We use a range of self-interest values... The ratios we use are [20:1, 10:1, 5:1, 3:1, 5:2, 2:1, 5:3, 4:3, 1:1]. We use a p-value threshold of 0.1 for the Dunnett s test.