Toward Exploratory Inverse Constraint Inference with Generative Diffusion Verifiers

Authors: Runyi Zhao, Sheng Xu, Bo Yue, Guiliang Liu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results demonstrate that Ex ICL can seamlessly and reliably generalize across different tasks and environments. To empirically validate our Ex ICL method, we assess its performance across a diverse set of tasks (including navigation, locomotion, and autonomous driving) and under various types of constraints (such as spatial, dynamic, and kinematic). Section 4: EMPIRICAL EVALUATION
Researcher Affiliation Academia 1School of Data Science, The Chinese University of Hong Kong, Shenzhen EMAIL EMAIL
Pseudocode Yes Algorithm 1 Exploratory Inverse Constraint Learning (Ex ICL)
Open Source Code Yes The code is available at https://github.com/Zhao Runyi/Ex ICL.
Open Datasets Yes Offline Dataset for Robot Control Tasks. We use the public offline dataset provided by (Quan et al., 2024). Specifically, this offline dataset includes a total number of 250 trajectories, which obtains 200 suboptimal trajectories (each with 1000 steps) and 50 expert trajectories from a PPOlag algorithm. Common Road-RL (Wang et al., 2021) Environment with a velocity<40 constraints. We chose the processed High D (Krajewski et al., 2018) data given by (Liu et al., 2023)
Dataset Splits Yes In each individual environment, the trajectories can be categorized into three parts: 1) expert trajectories generated by the expert policy trained under the PPO-Lagrangian algorithm and incorporates stochasticity of 0.05, allowing for random actions; 2) constraint-violating trajectories created by a policy that accelerates the agent s movement directly toward the terminating location, with stochasticity of 0.1; 3) random trajectories generated by the uniformly random policy. The proportion of the number of pairs in each kind of trajectory is around 5 : 1 : 1. Specifically, this offline dataset includes a total number of 250 trajectories, which obtains 200 suboptimal trajectories (each with 1000 steps) and 50 expert trajectories from a PPOlag algorithm.
Hardware Specification No No specific hardware details are mentioned for running the experiments. The paper discusses environments like Mujoco for simulation but does not specify the computational hardware (e.g., GPU/CPU models) used for training or evaluation.
Software Dependencies No The paper mentions using 'Mujoco' for simulated environments and references 'the official implementation of (Janner et al., 2022)' for model architecture, but does not provide specific version numbers for any software libraries, frameworks, or environments.
Experiment Setup Yes Table 3: List of the utilized hyperparameters in the navigation tasks in Point Maze and Mu Jo Co environments. This table includes specific values for Max Episode Length, Discount Factor, Policy Batchsize, Initial Lagrange Multiplier, Lagrange Multiplier Learning Rate, Guided Scale, Cost Model Horizon, Cost Model Learning Rate, Cost Model Update Step, Diffusion steps, Diffusion Time Feature Dimension, Diffusion Time Hidden Dimension, Hidden Feature Dimension, Convolution Kernal Size, U-Net depth, and Convolution Layers Dimension across various environments.