Generative Proto-Sequence: Sequence-Level Decision Making for Long-Horizon Reinforcement Learning

Authors: Netanel Fried, Liad Giladi, Gilad Katz

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluations across diverse maze navigation tasks of varying sizes and complexities demonstrate that GPS outperforms leading action repetition and temporal methods in the large majority of tested configurations, where it converges faster and achieves higher success rates. [...] We provide empirical results on challenging maze benchmarks, showing improvements over topperforming action repetition and temporal methods baselines in metrics such as convergence speed and success rate.
Researcher Affiliation Academia Netanel Fried EMAIL Department of Computer Science and Information Ben-Gurion University of the Negev Liad Giladi EMAIL Department of Computer Science and Information Ben-Gurion University of the Negev Gilad Katz EMAIL Department of Computer Science and Information Ben-Gurion University of the Negev
Pseudocode No The paper describes the architecture and components in detail across sections like '3.1 The Actor', '3.2 The Proto-Sequence Decoder', and '3.3 The Critic', explaining their functions and interactions. However, it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps formatted like a formal algorithm.
Open Source Code Yes We make our code publicly available1. [...] 1Code available at https://github.com/liadgiladi/Generative-Proto-Sequence
Open Datasets Yes Our code, as well as the mazes generated for our evaluation, are available in the appendix. [...] We generated our maze environments using LLM. [...] Following these criteria, we generated a total of 400 unique action sequences. These sequences form the basis of the training data for the Proto-Sequence Decoder (PSD).
Dataset Splits Yes All information on our generated mazes is presented in Table 1. For each maze size and type, the table presents: a) the sizes of our training, validation, and test sets, b) the range for the distance between the start and goal positions, and c) the average length of the optimal path.
Hardware Specification Yes All experiments were conducted on a system running Red Hat 5.14 with x86_64 architecture. We used an NVIDIA RTX 2080 GPU with 8GB of VRAM. [...] The VAE decoder requires approximately 10-15 minutes of pre-training on an Apple M1 Max with 64 GB RAM.
Software Dependencies No The paper mentions software components like 'Adam optimizer (torch.optim.Adam)' (J.4), 'torch.nn.functional.mse_loss' (J.6), and 'stable_baselines3.common.buffers.ReplayBuffer' (J.5). However, specific version numbers for these libraries (e.g., PyTorch, Stable Baselines3) are not provided, which is necessary for a reproducible description of ancillary software.
Experiment Setup Yes Unless otherwise noted for specific ablation studies, experiments were conducted using a common set of key hyperparameters, summarized in Tables 14 19 and 22 in the appendix. We selected the values based on preliminary experiments and common practices.