reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

HyPoGen: Optimization-Biased Hypernetworks for Generalizable Policy Generation

Authors: Hanxiang Ren, Li Sun, Xulong Wang, Pei Zhou, Zewen Wu, Siyan Dong, Difan Zou, Youyi Zheng, Yanchao Yang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on locomotion and manipulation benchmarks show that Hy Po Gen significantly outperforms state-of-the-art methods in generating policies for unseen target tasks without any demonstrations, achieving higher success rates and underscoring the potential of optimization-biased hypernetworks in advancing generalizable policy generation.
Researcher Affiliation	Collaboration	Hanxiang Ren Li Sun Xulong Wang Pei Zhou Zewen Wu ZJU & HKU HKU ZJU HKU HKU & Transcengram Siyan Dong Difan Zou Youyi Zheng Yanchao Yang HKU HKU ZJU HKU : First Authors (EMAIL, EMAIL)
Pseudocode	No	The paper describes the methodology using mathematical equations and prose but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code and data are available at: https://github.com/Re Nginx/Hy Po Gen. We have made significant efforts to ensure the reproducibility of our work. The code, datasets, and instructions necessary to replicate our experiments are publicly available at https://github.com/Re Nginx/Hy Po Gen.
Open Datasets	Yes	We leverage two sets of tasks for evaluation. The first set is derived from Mu Jo Co environments... The second set of tasks is sourced from the Mani Skill environment... For Mu Jo Co Environments, we pre-train a TD3 Fujimoto et al. (2018) agent on each specification separately for 1 x 10^6 steps as the expert. For each specification, 10 trajectories of length 1000 are collected as the demonstration. For Maniskill Environments, we pre-train a PPO agent Schulman et al. (2017) for 4 x 10^6 steps and collect 1000 successful trajectories for each specification.
Dataset Splits	Yes	For Mu Jo Co Environments... We randomly split the specifications into training and test sets using a 20% to 80% ratio. At test time, we evaluate the performance by calculating the average reward of roll-out policies. The training/test splitting process is repeated five times, and the average is calculated to reduce the effects of randomness. For Maniskill Environments... We uniformly sample 30% of the data for training, and use the remaining 70% for testing.
Hardware Specification	Yes	We train our model on an NVIDIA 4090 GPU with a batch size of 512 for 2000 epochs (approximately 11 hours); we use the same hyperparameters in Mani Skill except for training 1000 epochs (approximately 10 hours).
Software Dependencies	No	We implemented our method in Py Torch Paszke et al. (2019) and the hyperparameters are reported in Tab. 10, Tab. 11, and Tab. 12.
Experiment Setup	Yes	Implementation details of Hy Po Gen. ... We use K = 8 hypernet blocks and apply Adam (Kingma & Ba, 2015) optimizer with learning rate 1e-4. We train our model on an NVIDIA 4090 GPU with a batch size of 512 for 2000 epochs (approximately 11 hours); we use the same hyperparameters in Mani Skill except for training 1000 epochs (approximately 10 hours). We detail our hyperparameters in Sec. B.2.