reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning

Authors: Yun Qu, Yuhang Jiang, Boyuan Wang, Yixiu Mao, Cheems Wang, Chang Liu, Xiangyang Ji

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results witness that La Re (i) achieves superior temporal credit assignment to SOTA methods, (ii) excels in allocating contributions among multiple agents, and (iii) outperforms policies trained with ground truth rewards for certain tasks. We evaluate La Re1 on two widely used benchmarks in both single-agent and multi-agent settings: Mu Jo Co locomotion benchmark (Todorov, Erez, and Tassa 2012) and Multi Agent Particle Environment (MPE) (Lowe et al. 2017). Additionally, we perform ablation studies and further analyses to validate La Re s components and assess its properties.
Researcher Affiliation	Academia	Tsinghua University EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: La Re Input: LLM M, task information task, role instruction role, candidate responses number n, pre-collected random state-action pairs s, max episodes N max Output: policy network πθ, reward decoder model fψ
Open Source Code	Yes	Our code is available at https://github.com/thu-rllab/La Re
Open Datasets	Yes	We evaluate La Re1 on two widely used benchmarks in both single-agent and multi-agent settings: Mu Jo Co locomotion benchmark (Todorov, Erez, and Tassa 2012) and Multi Agent Particle Environment (MPE) (Lowe et al. 2017). Additionally, we perform ablation studies and further analyses to validate La Re s components and assess its properties. Moreover, we evaluate La Re in more complex scenarios from SMAC (Samvelyan et al. 2019) and a newly designed task, Triangle Area, in Appendix D and E.
Dataset Splits	No	The paper mentions using specific environments and benchmarks (Mu Jo Co, MPE, SMAC) and runs each algorithm on
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies	No	The paper mentions using "GPT-4o from Open AI API" but does not list specific version numbers for other key software components like programming languages (e.g., Python), libraries (e.g., PyTorch, TensorFlow), or other solvers.
Experiment Setup	No	The paper states that "Further details and results are available in the Appendix" regarding experimental setups and baselines, implying that specific hyperparameters, training configurations, or system-level settings are not detailed in the main text.