reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Generative Proto-Sequence: Sequence-Level Decision Making for Long-Horizon Reinforcement Learning

Authors: Netanel Fried, Liad Giladi, Gilad Katz

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluations across diverse maze navigation tasks of varying sizes and complexities demonstrate that GPS outperforms leading action repetition and temporal methods in the large majority of tested configurations, where it converges faster and achieves higher success rates. [...] We provide empirical results on challenging maze benchmarks, showing improvements over topperforming action repetition and temporal methods baselines in metrics such as convergence speed and success rate.
Researcher Affiliation	Academia	Netanel Fried EMAIL Department of Computer Science and Information Ben-Gurion University of the Negev Liad Giladi EMAIL Department of Computer Science and Information Ben-Gurion University of the Negev Gilad Katz EMAIL Department of Computer Science and Information Ben-Gurion University of the Negev
Pseudocode	No	The paper describes the architecture and components in detail across sections like '3.1 The Actor', '3.2 The Proto-Sequence Decoder', and '3.3 The Critic', explaining their functions and interactions. However, it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps formatted like a formal algorithm.
Open Source Code	Yes	We make our code publicly available1. [...] 1Code available at https://github.com/liadgiladi/Generative-Proto-Sequence
Open Datasets	Yes	Our code, as well as the mazes generated for our evaluation, are available in the appendix. [...] We generated our maze environments using LLM. [...] Following these criteria, we generated a total of 400 unique action sequences. These sequences form the basis of the training data for the Proto-Sequence Decoder (PSD).
Dataset Splits	Yes	All information on our generated mazes is presented in Table 1. For each maze size and type, the table presents: a) the sizes of our training, validation, and test sets, b) the range for the distance between the start and goal positions, and c) the average length of the optimal path.
Hardware Specification	Yes	All experiments were conducted on a system running Red Hat 5.14 with x86_64 architecture. We used an NVIDIA RTX 2080 GPU with 8GB of VRAM. [...] The VAE decoder requires approximately 10-15 minutes of pre-training on an Apple M1 Max with 64 GB RAM.
Software Dependencies	No	The paper mentions software components like 'Adam optimizer (torch.optim.Adam)' (J.4), 'torch.nn.functional.mse_loss' (J.6), and 'stable_baselines3.common.buffers.ReplayBuffer' (J.5). However, specific version numbers for these libraries (e.g., PyTorch, Stable Baselines3) are not provided, which is necessary for a reproducible description of ancillary software.
Experiment Setup	Yes	Unless otherwise noted for specific ablation studies, experiments were conducted using a common set of key hyperparameters, summarized in Tables 14 19 and 22 in the appendix. We selected the values based on preliminary experiments and common practices.