reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Toward Exploratory Inverse Constraint Inference with Generative Diffusion Verifiers

Authors: Runyi Zhao, Sheng Xu, Bo Yue, Guiliang Liu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical results demonstrate that Ex ICL can seamlessly and reliably generalize across different tasks and environments. To empirically validate our Ex ICL method, we assess its performance across a diverse set of tasks (including navigation, locomotion, and autonomous driving) and under various types of constraints (such as spatial, dynamic, and kinematic). Section 4: EMPIRICAL EVALUATION
Researcher Affiliation	Academia	1School of Data Science, The Chinese University of Hong Kong, Shenzhen EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 Exploratory Inverse Constraint Learning (Ex ICL)
Open Source Code	Yes	The code is available at https://github.com/Zhao Runyi/Ex ICL.
Open Datasets	Yes	Offline Dataset for Robot Control Tasks. We use the public offline dataset provided by (Quan et al., 2024). Specifically, this offline dataset includes a total number of 250 trajectories, which obtains 200 suboptimal trajectories (each with 1000 steps) and 50 expert trajectories from a PPOlag algorithm. Common Road-RL (Wang et al., 2021) Environment with a velocity<40 constraints. We chose the processed High D (Krajewski et al., 2018) data given by (Liu et al., 2023)
Dataset Splits	Yes	In each individual environment, the trajectories can be categorized into three parts: 1) expert trajectories generated by the expert policy trained under the PPO-Lagrangian algorithm and incorporates stochasticity of 0.05, allowing for random actions; 2) constraint-violating trajectories created by a policy that accelerates the agent s movement directly toward the terminating location, with stochasticity of 0.1; 3) random trajectories generated by the uniformly random policy. The proportion of the number of pairs in each kind of trajectory is around 5 : 1 : 1. Specifically, this offline dataset includes a total number of 250 trajectories, which obtains 200 suboptimal trajectories (each with 1000 steps) and 50 expert trajectories from a PPOlag algorithm.
Hardware Specification	No	No specific hardware details are mentioned for running the experiments. The paper discusses environments like Mujoco for simulation but does not specify the computational hardware (e.g., GPU/CPU models) used for training or evaluation.
Software Dependencies	No	The paper mentions using 'Mujoco' for simulated environments and references 'the official implementation of (Janner et al., 2022)' for model architecture, but does not provide specific version numbers for any software libraries, frameworks, or environments.
Experiment Setup	Yes	Table 3: List of the utilized hyperparameters in the navigation tasks in Point Maze and Mu Jo Co environments. This table includes specific values for Max Episode Length, Discount Factor, Policy Batchsize, Initial Lagrange Multiplier, Lagrange Multiplier Learning Rate, Guided Scale, Cost Model Horizon, Cost Model Learning Rate, Cost Model Update Step, Diffusion steps, Diffusion Time Feature Dimension, Diffusion Time Hidden Dimension, Hidden Feature Dimension, Convolution Kernal Size, U-Net depth, and Convolution Layers Dimension across various environments.