PEAR: Primitive Enabled Adaptive Relabeling for Boosting Hierarchical Reinforcement Learning

Authors: Utsav Singh, Vinay Purushothaman Namboodiri

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform extensive experiments on challenging environments and show that PEAR is able to outperform various hierarchical and non-hierarchical baselines and achieve upto 80% success rates in complex sparse robotic control tasks where other baselines typically fail to show significant progress. We also perform ablations to thoroughly analyse the importance of our various design choices. Finally, we perform real world robotic experiments on complex tasks and demonstrate that PEAR consistently outperforms the baselines.
Researcher Affiliation Academia Utsav Singh CSE Deptt. IIT Kanpur, India EMAIL Vinay P Namboodiri CS Deptt. University of Bath, Bath, UK EMAIL
Pseudocode Yes Algorithm 1 Adaptive Relabeling Algorithm 2 PEAR
Open Source Code Yes Please refer to the supplementary for a video depicting qualitative results, and the implementation code.
Open Datasets No The paper describes generating its own expert demonstrations for various environments (e.g., 'For maze navigation, we use path planning RRT (La Valle, 1998) algorithm to generate expert demonstration trajectories.' and 'For pick and place, we hard coded an optimal trajectory generation policy for generating demonstrations'). It does not state that these generated datasets are publicly available, nor does it rely on existing public datasets for its experiments.
Dataset Splits Yes We select 100 randomly generated mazes each for training, testing and validation. ... We select 100 random each for training, testing and validation. For selecting train, test and validation mazes, we first randomly generate 300 distinct environments with different block and target goal positions, and then randomly divide them into 100 train, test and validation mazes each. ... While training our hierarchical approach, we select 100 randomly generated initial and final rope configurations each for training, testing and validation.
Hardware Specification Yes We perform the experiments on two system each with Intel Core i7 processors, equipped with 48GB RAM and Nvidia Ge Force GTX 1080 GPUs.
Software Dependencies No In our experiments, we use Soft Actor Critic (Haarnoja et al., 2018b). The paper mentions the name of the software component 'Soft Actor Critic' but does not provide a specific version number for it or any other key libraries like Python, PyTorch, or CUDA.
Experiment Setup Yes The actor, critic and discriminator networks are formulated as 3 layer fully connected networks with 512 neurons in each layer. The regularization weight hyper-parameter Ψ is set at 0.001, 0.005, 0.005, 0.005, 0.005, and 0.005, the population hyper-parameter p is set to be 1.1e4, 2500, 2500, 2500, 3.9e5, and 1.4e4, and distance threshold hyper-parameter Qthresh is set at 10, 0, 0, 0, 0, and 0 for maze, pick and place, bin, hollow, rope and kitchen tasks respectively.