Conservative Offline Goal-Conditioned Implicit V-Learning
Authors: Kaiqiang Ke, Qian Lin, Zongkai Liu, Shenghong He, Chao Yu
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluations on OGBench, a benchmark for offline GCRL, demonstrate that CGCIVL consistently surpasses state-of-the-art methods across diverse tasks. ... Empirically, experiments on OGBench (Park et al., 2024), a benchmark specifically designed for offline GCRL, demonstrate that our algorithm consistently matches or surpasses state-of-the-art methods across distinct environments with varying configurations. |
| Researcher Affiliation | Academia | 1School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China 2Shanghai Innovation Institute, Shanghai, China 3Pengcheng Laboratory, Shenzhen, China. Correspondence to: Chao Yu <EMAIL>. |
| Pseudocode | Yes | The pseudocode of CGCIVL is shown in Algorithm 1. ... Algorithm 1 CGCIVL |
| Open Source Code | Yes | Our algorithm implementation is based on the reproduction of HIQL in the open-source OGbench and can be found at https://github.com/kkq2018/CGCIVL.git. |
| Open Datasets | Yes | We evaluate the proposed algorithm on OGbench (Park et al., 2024), a benchmark designed to evaluate algorithms in offline GCRL across diverse tasks and datasets. ... The dataset collected using a noisy expert policy that navigates the maze by sequentially reaching randomly sampled goals. This dataset is used to evaluate navigation performance. ... medium maze in D4RL (Fu et al., 2020). |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits with percentages, sample counts, or references to predefined splits for reproduction. It describes how data is collected and how policies are evaluated, but not how the datasets are formally split. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions that the algorithm implementation is based on 'HIQL in the open-source OGbench' but does not specify any particular software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | The detailed hyperparameter settings are shown in Table 2. The training process employed a batch size of 1024, with the policy and value networks designed as MLPs of dimensions (256, 256) and (512, 512, 512), respectively. The GELU activation function was used to ensure smooth gradient flow, while the Adam optimizer, configured with a learning rate of 0.0003, facilitated efficient parameter updates. To further stabilize the training process, the target network smoothing coefficient is set to 0.005. ... In the antmaze-giant-stitch-v0 environment, the algorithm was trained for 2,000,000 steps. In the humanoidmaze-giant-navigate-v0 and humanoidmaze-giant-stitch-v0 environments, the algorithm was trained for 3,000,000 steps. For all other environments, the training steps were set to 1,000,000, consistent with the settings in OGbench. |