reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Quasipseudometric Value Functions with Dense Rewards

Authors: Khadichabonu Valieva, Bikramjit Banerjee

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate this proposal in 12 standard benchmark environments in GCRL featuring challenging continuous control tasks. Our empirical results confirm that training a quasipseudometric value function in our dense reward setting indeed either improves upon, or preserves, the sample complexity of training with sparse rewards. This opens up opportunities to train efficient neural architectures with dense rewards, compounding their benefits to sample complexity.
Researcher Affiliation	Academia	Khadichabonu Valieva EMAIL School of Computing Sciences & Computer Engineering The University of Southern Mississippi Bikramjit Banerjee EMAIL School of Computing Sciences & Computer Engineering The University of Southern Mississippi
Pseudocode	Yes	Algorithm 1 GCRL with Potential-Shaped Dense Rewards
Open Source Code	Yes	We use the MRN code repository publicly available at: https://github.com/Cranial-XIX/ metric-residual-network which implements DDPG+HER with MRN and PQE (among others) critics for the GCRL manipulation tasks. We make two simple modifications: (1) add Eq. 6 to the default (sparse) reward function (Eq. 1) as shown on line 13 in Alg. 1; (2) use Eq. 13 to clip the critic s output as shown on line 14 in Alg. 1, but only for dac. No other changes were made to any algorithm or neural architecture. The newly added parameter η was set to 0.01 for dac and 0.2 for d E. These values were selected from the sets {0.01, 0.02, 0.03, 0.04, 0.05} 1 for dac and {0.02, 0.2, 2.0} for d E, using performance improvement with MRN architecture as the criterion. Our modified versions of the code and scripts are also available at: https://github.com/khadimon/GCRL-Dense-Rewards.
Open Datasets	Yes	We use GCRL benchmark manipulation tasks with the Fetch robot and Shadow-hand domains (Plappert et al., 2018); see Fig. 1(b). We use two critic architectures to demonstrate generality: MRN (Liu et al., 2023) and PQE (Wang & Isola, 2022). Gymnasium-Robotics. Gymnasium-Robotics Documentation. In Farama Foundation, 2018. https: //robotics.farama.org/.
Dataset Splits	No	The paper mentions using "12 standard benchmark environments" and that "Learning curves are averaged over five independent trials". While it refers to standard benchmarks, it does not explicitly provide specific percentages, counts, or methodology for training/test/validation splits within the paper. Such details are usually part of the benchmark's documentation, but not detailed here.
Hardware Specification	Yes	All experiments were run on NVIDIA Quadro RTX 6000 GPUs with 24 Gi B of memory each and running on Ubuntu 24.04.
Software Dependencies	No	The paper mentions using "MRN (Liu et al., 2023) and PQE (Wang & Isola, 2022) critic architectures" and that it "implements DDPG+HER". It also notes the operating system "Ubuntu 24.04". However, it does not specify version numbers for key software libraries or frameworks (e.g., Python, PyTorch, TensorFlow) used to implement these methods.
Experiment Setup	Yes	The newly added parameter η was set to 0.01 for dac and 0.2 for d E. These values were selected from the sets {0.01, 0.02, 0.03, 0.04, 0.05} 1 for dac and {0.02, 0.2, 2.0} for d E, using performance improvement with MRN architecture as the criterion. Our modified versions of the code and scripts are also available at: https://github.com/khadimon/GCRL-Dense-Rewards.