Quasipseudometric Value Functions with Dense Rewards
Authors: Khadichabonu Valieva, Bikramjit Banerjee
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate this proposal in 12 standard benchmark environments in GCRL featuring challenging continuous control tasks. Our empirical results confirm that training a quasipseudometric value function in our dense reward setting indeed either improves upon, or preserves, the sample complexity of training with sparse rewards. This opens up opportunities to train efficient neural architectures with dense rewards, compounding their benefits to sample complexity. |
| Researcher Affiliation | Academia | Khadichabonu Valieva EMAIL School of Computing Sciences & Computer Engineering The University of Southern Mississippi Bikramjit Banerjee EMAIL School of Computing Sciences & Computer Engineering The University of Southern Mississippi |
| Pseudocode | Yes | Algorithm 1 GCRL with Potential-Shaped Dense Rewards |
| Open Source Code | Yes | We use the MRN code repository publicly available at: https://github.com/Cranial-XIX/ metric-residual-network which implements DDPG+HER with MRN and PQE (among others) critics for the GCRL manipulation tasks. We make two simple modifications: (1) add Eq. 6 to the default (sparse) reward function (Eq. 1) as shown on line 13 in Alg. 1; (2) use Eq. 13 to clip the critic s output as shown on line 14 in Alg. 1, but only for dac. No other changes were made to any algorithm or neural architecture. The newly added parameter η was set to 0.01 for dac and 0.2 for d E. These values were selected from the sets {0.01, 0.02, 0.03, 0.04, 0.05} 1 for dac and {0.02, 0.2, 2.0} for d E, using performance improvement with MRN architecture as the criterion. Our modified versions of the code and scripts are also available at: https://github.com/khadimon/GCRL-Dense-Rewards. |
| Open Datasets | Yes | We use GCRL benchmark manipulation tasks with the Fetch robot and Shadow-hand domains (Plappert et al., 2018); see Fig. 1(b). We use two critic architectures to demonstrate generality: MRN (Liu et al., 2023) and PQE (Wang & Isola, 2022). Gymnasium-Robotics. Gymnasium-Robotics Documentation. In Farama Foundation, 2018. https: //robotics.farama.org/. |
| Dataset Splits | No | The paper mentions using "12 standard benchmark environments" and that "Learning curves are averaged over five independent trials". While it refers to standard benchmarks, it does not explicitly provide specific percentages, counts, or methodology for training/test/validation splits within the paper. Such details are usually part of the benchmark's documentation, but not detailed here. |
| Hardware Specification | Yes | All experiments were run on NVIDIA Quadro RTX 6000 GPUs with 24 Gi B of memory each and running on Ubuntu 24.04. |
| Software Dependencies | No | The paper mentions using "MRN (Liu et al., 2023) and PQE (Wang & Isola, 2022) critic architectures" and that it "implements DDPG+HER". It also notes the operating system "Ubuntu 24.04". However, it does not specify version numbers for key software libraries or frameworks (e.g., Python, PyTorch, TensorFlow) used to implement these methods. |
| Experiment Setup | Yes | The newly added parameter η was set to 0.01 for dac and 0.2 for d E. These values were selected from the sets {0.01, 0.02, 0.03, 0.04, 0.05} 1 for dac and {0.02, 0.2, 2.0} for d E, using performance improvement with MRN architecture as the criterion. Our modified versions of the code and scripts are also available at: https://github.com/khadimon/GCRL-Dense-Rewards. |