Horizon Generalization in Reinforcement Learning

Authors: Vivek Myers, Catherine Ji, Benjamin Eysenbach

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present new experimental results and recall findings from prior work in support of our theoretical results. Taken together, our results open the door to studying how techniques for invariance and generalization developed in other areas of machine learning might be adapted to achieve this alluring property.
Researcher Affiliation Academia Vivek Myers UC Berkeley EMAIL Catherine Ji Princeton University EMAIL Benjamin Eysenbach Princeton University EMAIL
Pseudocode No The paper describes methods using mathematical definitions and textual descriptions, but it does not include any clearly labeled pseudocode or algorithm blocks with structured steps formatted like code.
Open Source Code Yes Website and code: https://horizon-generalization.github.io
Open Datasets Yes Ant (Figure 5): For this task we used a version of the Ant environment from Bortkiewicz et al. (2024) modified to have variable start positions and distances to the goal. Ant Maze and Humanoid (Figure 6): Both environments are modified versions of the Ant Maze and Humanoid environments from Bortkiewicz et al. (2024).
Dataset Splits Yes We generated 3000 trajectories of length 50 using a random policy. Evaluation is done over 1000 randomly-sampled start-goal pairs. We start by running a series of experiments to compare the horizon generalization of different learning algorithms (CRL (Eysenbach et al., 2022) and SAC (Haarnoja et al., 2018)) and distance metric architectures (details in Appendix E). The results in Fig. 5 highlight that both the learning algorithm and the architecture can play an important role in horizon generalization, while also underscoring that achieving high horizon generalization in high-dimensional settings remains an open problem. See Table 1 for a summary of the methods used in these experiments.
Hardware Specification No The paper mentions 'Princeton Research Computing for assistance with the numerical experiments' but does not provide specific hardware details such as GPU/CPU models or memory amounts used for the experiments.
Software Dependencies No The paper mentions using 'Adam optimizer' and 'neural networks with Swish activations', and provides a code snippet using `jax.nn.softmax` and `optax.kl_divergence`, but does not specify version numbers for any software libraries or dependencies used.
Experiment Setup Yes We used a representation dimension of 16, a batch size of 256, neural networks with 2 hidden layers of width 32 and Swish activations, γ = 0.9, and Adam optimizer with learning rate 3 × 10−3.