Generative Proto-Sequence: Sequence-Level Decision Making for Long-Horizon Reinforcement Learning
Authors: Netanel Fried, Liad Giladi, Gilad Katz
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluations across diverse maze navigation tasks of varying sizes and complexities demonstrate that GPS outperforms leading action repetition and temporal methods in the large majority of tested configurations, where it converges faster and achieves higher success rates. [...] We provide empirical results on challenging maze benchmarks, showing improvements over topperforming action repetition and temporal methods baselines in metrics such as convergence speed and success rate. |
| Researcher Affiliation | Academia | Netanel Fried EMAIL Department of Computer Science and Information Ben-Gurion University of the Negev Liad Giladi EMAIL Department of Computer Science and Information Ben-Gurion University of the Negev Gilad Katz EMAIL Department of Computer Science and Information Ben-Gurion University of the Negev |
| Pseudocode | No | The paper describes the architecture and components in detail across sections like '3.1 The Actor', '3.2 The Proto-Sequence Decoder', and '3.3 The Critic', explaining their functions and interactions. However, it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps formatted like a formal algorithm. |
| Open Source Code | Yes | We make our code publicly available1. [...] 1Code available at https://github.com/liadgiladi/Generative-Proto-Sequence |
| Open Datasets | Yes | Our code, as well as the mazes generated for our evaluation, are available in the appendix. [...] We generated our maze environments using LLM. [...] Following these criteria, we generated a total of 400 unique action sequences. These sequences form the basis of the training data for the Proto-Sequence Decoder (PSD). |
| Dataset Splits | Yes | All information on our generated mazes is presented in Table 1. For each maze size and type, the table presents: a) the sizes of our training, validation, and test sets, b) the range for the distance between the start and goal positions, and c) the average length of the optimal path. |
| Hardware Specification | Yes | All experiments were conducted on a system running Red Hat 5.14 with x86_64 architecture. We used an NVIDIA RTX 2080 GPU with 8GB of VRAM. [...] The VAE decoder requires approximately 10-15 minutes of pre-training on an Apple M1 Max with 64 GB RAM. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer (torch.optim.Adam)' (J.4), 'torch.nn.functional.mse_loss' (J.6), and 'stable_baselines3.common.buffers.ReplayBuffer' (J.5). However, specific version numbers for these libraries (e.g., PyTorch, Stable Baselines3) are not provided, which is necessary for a reproducible description of ancillary software. |
| Experiment Setup | Yes | Unless otherwise noted for specific ablation studies, experiments were conducted using a common set of key hyperparameters, summarized in Tables 14 19 and 22 in the appendix. We selected the values based on preliminary experiments and common practices. |