Imagination-Limited Q-Learning for Offline Reinforcement Learning

Authors: Wenhui Liu, Zhijian Wu, Jingchao Wang, Dingjiang Huang, Shuigeng Zhou

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, our method achieves state-of-the-art performance on a wide range of tasks in the D4RL benchmark. In this section, we empirically validate the effectiveness of our method ILQ. 1) We demonstrate the superiority of ILQ over existing methods by comparing performance across a series of tasks. 2) We conduct sensitivity analyses on the hyperparameters involved in ILQ, confirming the stability of the proposed method. 3) We then perform ablation experiments on both imagination and limitation components to verify their impacts.
Researcher Affiliation Academia 1East China Normal University 2Fudan University EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Imagination-Limited Q-Learning (ILQ) Require: The offline dataset D, number of iterations N, discount factor γ, target network update rate τ, trade-off factor η, and offset parameter δ.
Open Source Code Yes The Appendix is available at https://github.com/Liu WH-AI/ILQ.
Open Datasets Yes We evaluate ILQ on the D4RL [Fu et al., 2020] benchmark. The commonly used domain is Gym Mu Jo Co -v2 , including halfcheetah, hopper, and walker2d tasks at four levels: random (r), medium (m), medium-replay (mr), and medium-expert (me). We also assess ILQ on Maze2D -v1 domain, which offers three layouts with two reward types, i.e., umaze (u), umaze-dense (ud), medium (m), medium-dense (md), large (l), and large-dense (ld). In addition, comparisons on several adroit -v0 tasks are conducted.
Dataset Splits No The paper mentions using specific D4RL tasks (e.g., halfcheetah-me, hopper-mr) which are pre-configured datasets. However, it does not explicitly provide details on any further training/validation/test splits conducted by the authors within these datasets for their experiments.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models or processor types used for running experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) needed to replicate the experiment. It mentions that "Details of hyperparameters settings and implementation specifics are also provided in the Appendix" which may contain this information, but it is not in the main text.
Experiment Setup No Details of hyperparameters settings and implementation specifics are also provided in the Appendix to ensure reproducibility. The main text discusses sensitivity analyses for some parameters (δ, η) but does not provide a comprehensive list of all hyperparameters and their chosen values for the main experimental results.