reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Differentiable Information Enhanced Model-Based Reinforcement Learning

Authors: Xiaoyuan Zhang, Xinyan Cai, Bo Liu, Weidong Huang, Song-Chun Zhu, Siyuan Qi, Yaodong Yang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To validate the effectiveness of our approach in differentiable environments, we provide theoretical analysis and empirical results. Notably, our approach outperforms previous model-based and model-free methods, in multiple challenging tasks involving controllable rigid robots such as humanoid robots motion control and deformable object manipulation.
Researcher Affiliation	Academia	1 Institute for Artificial Intelligence, Peking University 2 State Key Laboratory of General Artificial Intelligence, Peking University, Beijing, China 3 State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China 4 Institute of automation, Chinese Academy of Sciences
Pseudocode	Yes	We list the pseudo code in Algorithm 1. Algorithm 1: MB-MIX
Open Source Code	No	The paper does not contain any explicit statements about releasing code, nor does it provide a link to a code repository.
Open Datasets	Yes	We then conducted experiments on two benchmarks, Diff RL (Xu et al. 2021) and Brax (Freeman et al. 2021), which contain classic robot control problems. Moreover, in Da XBench(Chen et al. 2022), we demonstrated the effectiveness of our method in differentiable deformable object environments with large state and action spaces.
Dataset Splits	No	The paper mentions environments/benchmarks like Diff RL, Brax, and Da XBench, but it does not specify any dataset splits (e.g., train/test/validation percentages or counts) within these environments that would be needed for reproduction.
Hardware Specification	No	The paper mentions the 'Bruce' humanoid robot as an experimental subject but does not provide details on the hardware used to run the experiments or train the models (e.g., GPU/CPU models, memory).
Software Dependencies	No	The paper mentions various algorithms and methods (e.g., SHAC, PPO, SAC, Dreamer V3) but does not list any specific software libraries, frameworks, or operating system versions with their respective version numbers.
Experiment Setup	Yes	In the experiment, our MBMIX algorithm was trained on all six tasks, with a λ = 0.98 and mix-interval set to 1 or 2 depending on the task. The state and action spaces have dimensions 20 and 5, respectively, yielding a reward matrix R R20 5. The initial policy is a matrix θ0 R20 5, and the final policy πθ is obtained via softmax activation: πθ(a\|s) = exp(θ(s, a))/ P b exp(θ(s, b)). Lower parallel environments (4 and 8) were used to highlight sample efficiency of model-based methods.