reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

RIZE: Adaptive Regularization for Imitation Learning

Authors: Adib Karimi, Mohammad Mehdi Ebadzadeh

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, our method achieves expert-level performance on complex Mu Jo Co and Adroit environments, surpassing baseline methods on the Humanoid-v2 task with limited expert demonstrations. Extensive experiments and ablation studies further validate the effectiveness of the approach and provide insights into reward dynamics in imitation learning. Our source code is available at https://github.com/adibka/RIZE.
Researcher Affiliation	Academia	Adib Karimi EMAIL Department of Computer Engineering Amirkabir University of Technology Mohammad Mehdi Ebadzadeh EMAIL Department of Computer Engineering Amirkabir University of Technology
Pseudocode	Yes	Algorithm 1 RIZE 1: Initialize Zϕ, πθ, λπE, and λπ 2: for step t in {1, . . . , N} do 3: Calculate Q(s, a) = E[Zϕ(s, a)] using Eq. 5 4: Update Zϕ using Eq. 9 5: ϕt+1 ϕt βZ ϕ[ L(ϕ)] 6: Update πθ (like SAC) θt+1 θt + βπ θE s D, a πθ( \|s) [ min k=1,2 Qk(s, a) α log πθ(a\|s)] 8: Update λπ and λπE using Eq. 8 9: λπ t+1 λπ t βλπ λπΓ(RQ, λ) 10: λπE t+1 λπE t βλπE λπE Γ(RQ, λ) 11: end for
Open Source Code	Yes	Our source code is available at https://github.com/adibka/RIZE. All implementation details used in our experiments are publicly available at https://github.com/adibka/RIZE.
Open Datasets	Yes	We study continuous-control imitation learning from state action expert samples, evaluating our algorithm on five Mu Jo Co (Todorov et al., 2012) benchmarks (Half Cheetah-v2, Walker2d-v2, Ant-v2, Humanoid-v2, Hopper-v2) and one Adroit Hand task (Rajeswaran et al., 2018). ... For Hammer-v1 from the Adroit suite (Rajeswaran et al., 2018), we use the D4RL dataset (Fu et al., 2021) and filter the top 100 episodes from the original 5,000.
Dataset Splits	Yes	We assess each method with three and ten expert trajectories. ... Expert trajectories for these tasks are taken from IQ-Learn (Garg et al., 2021) and were generated with Soft Actor Critic (Haarnoja et al., 2018); each trajectory contains 1,000 state action transitions. ... For Hammer-v1 from the Adroit suite (Rajeswaran et al., 2018), we use the D4RL dataset (Fu et al., 2021) and filter the top 100 episodes from the original 5,000.
Hardware Specification	No	The paper does not provide specific hardware details such as CPU, GPU models, or cloud instance types used for running the experiments.
Software Dependencies	No	Our architecture integrates components from Distributional SAC (DSAC) (Ma et al., 2020) and IQ-Learn (Garg et al., 2021), with hyperparameters tuned through search and ablation studies. ... The paper mentions using DSAC and IQ-Learn components but does not specify version numbers for general software dependencies (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup	Yes	Our architecture integrates components from Distributional SAC (DSAC) (Ma et al., 2020) and IQ-Learn (Garg et al., 2021), with hyperparameters tuned through search and ablation studies. Key configurations for experiments involving three and ten demonstrations are summarized in Table 1. ... The critic network is implemented as a three-layer multilayer perceptron (MLP) with 256 units per layer, trained using a learning rate of 3e-4. The policy network is a four-layer MLP, also with 256 units per layer. ... replay buffer size 10^6, batch size 256, 24 quantile levels, and 10,000 pretraining steps. ... Across all tasks, we set initial target reward parameters as λπE = 10 and λπ = 5.