reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

PEAR: Primitive Enabled Adaptive Relabeling for Boosting Hierarchical Reinforcement Learning

Authors: Utsav Singh, Vinay Purushothaman Namboodiri

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform extensive experiments on challenging environments and show that PEAR is able to outperform various hierarchical and non-hierarchical baselines and achieve upto 80% success rates in complex sparse robotic control tasks where other baselines typically fail to show significant progress. We also perform ablations to thoroughly analyse the importance of our various design choices. Finally, we perform real world robotic experiments on complex tasks and demonstrate that PEAR consistently outperforms the baselines.
Researcher Affiliation	Academia	Utsav Singh CSE Deptt. IIT Kanpur, India EMAIL Vinay P Namboodiri CS Deptt. University of Bath, Bath, UK EMAIL
Pseudocode	Yes	Algorithm 1 Adaptive Relabeling Algorithm 2 PEAR
Open Source Code	Yes	Please refer to the supplementary for a video depicting qualitative results, and the implementation code.
Open Datasets	No	The paper describes generating its own expert demonstrations for various environments (e.g., 'For maze navigation, we use path planning RRT (La Valle, 1998) algorithm to generate expert demonstration trajectories.' and 'For pick and place, we hard coded an optimal trajectory generation policy for generating demonstrations'). It does not state that these generated datasets are publicly available, nor does it rely on existing public datasets for its experiments.
Dataset Splits	Yes	We select 100 randomly generated mazes each for training, testing and validation. ... We select 100 random each for training, testing and validation. For selecting train, test and validation mazes, we first randomly generate 300 distinct environments with different block and target goal positions, and then randomly divide them into 100 train, test and validation mazes each. ... While training our hierarchical approach, we select 100 randomly generated initial and final rope configurations each for training, testing and validation.
Hardware Specification	Yes	We perform the experiments on two system each with Intel Core i7 processors, equipped with 48GB RAM and Nvidia Ge Force GTX 1080 GPUs.
Software Dependencies	No	In our experiments, we use Soft Actor Critic (Haarnoja et al., 2018b). The paper mentions the name of the software component 'Soft Actor Critic' but does not provide a specific version number for it or any other key libraries like Python, PyTorch, or CUDA.
Experiment Setup	Yes	The actor, critic and discriminator networks are formulated as 3 layer fully connected networks with 512 neurons in each layer. The regularization weight hyper-parameter Ψ is set at 0.001, 0.005, 0.005, 0.005, 0.005, and 0.005, the population hyper-parameter p is set to be 1.1e4, 2500, 2500, 2500, 3.9e5, and 1.4e4, and distance threshold hyper-parameter Qthresh is set at 10, 0, 0, 0, 0, and 0 for maze, pick and place, bin, hollow, rope and kitchen tasks respectively.