Quasipseudometric Value Functions with Dense Rewards

Authors: Khadichabonu Valieva, Bikramjit Banerjee

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate this proposal in 12 standard benchmark environments in GCRL featuring challenging continuous control tasks. Our empirical results confirm that training a quasipseudometric value function in our dense reward setting indeed either improves upon, or preserves, the sample complexity of training with sparse rewards. This opens up opportunities to train efficient neural architectures with dense rewards, compounding their benefits to sample complexity.
Researcher Affiliation Academia Khadichabonu Valieva EMAIL School of Computing Sciences & Computer Engineering The University of Southern Mississippi Bikramjit Banerjee EMAIL School of Computing Sciences & Computer Engineering The University of Southern Mississippi
Pseudocode Yes Algorithm 1 GCRL with Potential-Shaped Dense Rewards
Open Source Code Yes We use the MRN code repository publicly available at: https://github.com/Cranial-XIX/ metric-residual-network which implements DDPG+HER with MRN and PQE (among others) critics for the GCRL manipulation tasks. We make two simple modifications: (1) add Eq. 6 to the default (sparse) reward function (Eq. 1) as shown on line 13 in Alg. 1; (2) use Eq. 13 to clip the critic s output as shown on line 14 in Alg. 1, but only for dac. No other changes were made to any algorithm or neural architecture. The newly added parameter η was set to 0.01 for dac and 0.2 for d E. These values were selected from the sets {0.01, 0.02, 0.03, 0.04, 0.05} 1 for dac and {0.02, 0.2, 2.0} for d E, using performance improvement with MRN architecture as the criterion. Our modified versions of the code and scripts are also available at: https://github.com/khadimon/GCRL-Dense-Rewards.
Open Datasets Yes We use GCRL benchmark manipulation tasks with the Fetch robot and Shadow-hand domains (Plappert et al., 2018); see Fig. 1(b). We use two critic architectures to demonstrate generality: MRN (Liu et al., 2023) and PQE (Wang & Isola, 2022). Gymnasium-Robotics. Gymnasium-Robotics Documentation. In Farama Foundation, 2018. https: //robotics.farama.org/.
Dataset Splits No The paper mentions using "12 standard benchmark environments" and that "Learning curves are averaged over five independent trials". While it refers to standard benchmarks, it does not explicitly provide specific percentages, counts, or methodology for training/test/validation splits within the paper. Such details are usually part of the benchmark's documentation, but not detailed here.
Hardware Specification Yes All experiments were run on NVIDIA Quadro RTX 6000 GPUs with 24 Gi B of memory each and running on Ubuntu 24.04.
Software Dependencies No The paper mentions using "MRN (Liu et al., 2023) and PQE (Wang & Isola, 2022) critic architectures" and that it "implements DDPG+HER". It also notes the operating system "Ubuntu 24.04". However, it does not specify version numbers for key software libraries or frameworks (e.g., Python, PyTorch, TensorFlow) used to implement these methods.
Experiment Setup Yes The newly added parameter η was set to 0.01 for dac and 0.2 for d E. These values were selected from the sets {0.01, 0.02, 0.03, 0.04, 0.05} 1 for dac and {0.02, 0.2, 2.0} for d E, using performance improvement with MRN architecture as the criterion. Our modified versions of the code and scripts are also available at: https://github.com/khadimon/GCRL-Dense-Rewards.