reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Horizon Generalization in Reinforcement Learning

Authors: Vivek Myers, Catherine Ji, Benjamin Eysenbach

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present new experimental results and recall findings from prior work in support of our theoretical results. Taken together, our results open the door to studying how techniques for invariance and generalization developed in other areas of machine learning might be adapted to achieve this alluring property.
Researcher Affiliation	Academia	Vivek Myers UC Berkeley EMAIL Catherine Ji Princeton University EMAIL Benjamin Eysenbach Princeton University EMAIL
Pseudocode	No	The paper describes methods using mathematical definitions and textual descriptions, but it does not include any clearly labeled pseudocode or algorithm blocks with structured steps formatted like code.
Open Source Code	Yes	Website and code: https://horizon-generalization.github.io
Open Datasets	Yes	Ant (Figure 5): For this task we used a version of the Ant environment from Bortkiewicz et al. (2024) modified to have variable start positions and distances to the goal. Ant Maze and Humanoid (Figure 6): Both environments are modified versions of the Ant Maze and Humanoid environments from Bortkiewicz et al. (2024).
Dataset Splits	Yes	We generated 3000 trajectories of length 50 using a random policy. Evaluation is done over 1000 randomly-sampled start-goal pairs. We start by running a series of experiments to compare the horizon generalization of different learning algorithms (CRL (Eysenbach et al., 2022) and SAC (Haarnoja et al., 2018)) and distance metric architectures (details in Appendix E). The results in Fig. 5 highlight that both the learning algorithm and the architecture can play an important role in horizon generalization, while also underscoring that achieving high horizon generalization in high-dimensional settings remains an open problem. See Table 1 for a summary of the methods used in these experiments.
Hardware Specification	No	The paper mentions 'Princeton Research Computing for assistance with the numerical experiments' but does not provide specific hardware details such as GPU/CPU models or memory amounts used for the experiments.
Software Dependencies	No	The paper mentions using 'Adam optimizer' and 'neural networks with Swish activations', and provides a code snippet using `jax.nn.softmax` and `optax.kl_divergence`, but does not specify version numbers for any software libraries or dependencies used.
Experiment Setup	Yes	We used a representation dimension of 16, a batch size of 256, neural networks with 2 hidden layers of width 32 and Swish activations, γ = 0.9, and Adam optimizer with learning rate 3 × 10−3.