Conservative Offline Goal-Conditioned Implicit V-Learning

Authors: Kaiqiang Ke, Qian Lin, Zongkai Liu, Shenghong He, Chao Yu

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluations on OGBench, a benchmark for offline GCRL, demonstrate that CGCIVL consistently surpasses state-of-the-art methods across diverse tasks. ... Empirically, experiments on OGBench (Park et al., 2024), a benchmark specifically designed for offline GCRL, demonstrate that our algorithm consistently matches or surpasses state-of-the-art methods across distinct environments with varying configurations.
Researcher Affiliation Academia 1School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China 2Shanghai Innovation Institute, Shanghai, China 3Pengcheng Laboratory, Shenzhen, China. Correspondence to: Chao Yu <EMAIL>.
Pseudocode Yes The pseudocode of CGCIVL is shown in Algorithm 1. ... Algorithm 1 CGCIVL
Open Source Code Yes Our algorithm implementation is based on the reproduction of HIQL in the open-source OGbench and can be found at https://github.com/kkq2018/CGCIVL.git.
Open Datasets Yes We evaluate the proposed algorithm on OGbench (Park et al., 2024), a benchmark designed to evaluate algorithms in offline GCRL across diverse tasks and datasets. ... The dataset collected using a noisy expert policy that navigates the maze by sequentially reaching randomly sampled goals. This dataset is used to evaluate navigation performance. ... medium maze in D4RL (Fu et al., 2020).
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits with percentages, sample counts, or references to predefined splits for reproduction. It describes how data is collected and how policies are evaluated, but not how the datasets are formally split.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions that the algorithm implementation is based on 'HIQL in the open-source OGbench' but does not specify any particular software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes The detailed hyperparameter settings are shown in Table 2. The training process employed a batch size of 1024, with the policy and value networks designed as MLPs of dimensions (256, 256) and (512, 512, 512), respectively. The GELU activation function was used to ensure smooth gradient flow, while the Adam optimizer, configured with a learning rate of 0.0003, facilitated efficient parameter updates. To further stabilize the training process, the target network smoothing coefficient is set to 0.005. ... In the antmaze-giant-stitch-v0 environment, the algorithm was trained for 2,000,000 steps. In the humanoidmaze-giant-navigate-v0 and humanoidmaze-giant-stitch-v0 environments, the algorithm was trained for 3,000,000 steps. For all other environments, the training steps were set to 1,000,000, consistent with the settings in OGbench.