reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Imagination-Limited Q-Learning for Offline Reinforcement Learning

Authors: Wenhui Liu, Zhijian Wu, Jingchao Wang, Dingjiang Huang, Shuigeng Zhou

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, our method achieves state-of-the-art performance on a wide range of tasks in the D4RL benchmark. In this section, we empirically validate the effectiveness of our method ILQ. 1) We demonstrate the superiority of ILQ over existing methods by comparing performance across a series of tasks. 2) We conduct sensitivity analyses on the hyperparameters involved in ILQ, confirming the stability of the proposed method. 3) We then perform ablation experiments on both imagination and limitation components to verify their impacts.
Researcher Affiliation	Academia	1East China Normal University 2Fudan University EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Imagination-Limited Q-Learning (ILQ) Require: The offline dataset D, number of iterations N, discount factor γ, target network update rate τ, trade-off factor η, and offset parameter δ.
Open Source Code	Yes	The Appendix is available at https://github.com/Liu WH-AI/ILQ.
Open Datasets	Yes	We evaluate ILQ on the D4RL [Fu et al., 2020] benchmark. The commonly used domain is Gym Mu Jo Co -v2 , including halfcheetah, hopper, and walker2d tasks at four levels: random (r), medium (m), medium-replay (mr), and medium-expert (me). We also assess ILQ on Maze2D -v1 domain, which offers three layouts with two reward types, i.e., umaze (u), umaze-dense (ud), medium (m), medium-dense (md), large (l), and large-dense (ld). In addition, comparisons on several adroit -v0 tasks are conducted.
Dataset Splits	No	The paper mentions using specific D4RL tasks (e.g., halfcheetah-me, hopper-mr) which are pre-configured datasets. However, it does not explicitly provide details on any further training/validation/test splits conducted by the authors within these datasets for their experiments.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models or processor types used for running experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) needed to replicate the experiment. It mentions that "Details of hyperparameters settings and implementation specifics are also provided in the Appendix" which may contain this information, but it is not in the main text.
Experiment Setup	No	Details of hyperparameters settings and implementation specifics are also provided in the Appendix to ensure reproducibility. The main text discusses sensitivity analyses for some parameters (δ, η) but does not provide a comprehensive list of all hyperparameters and their chosen values for the main experimental results.