reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

What Makes a Good Feedforward Computational Graph?

Authors: Alex Vitvitskyi, João Guilherme Madeira Araújo, Marc Lackenby, Petar Veličković

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our study is backed by both theoretical analyses of the metrics asymptotic behaviour for various graphs, as well as correlating these metrics to the performance of trained neural network models using the corresponding graphs.
Researcher Affiliation	Collaboration	1Google Deep Mind 2University of Oxford. Correspondence to: Alex Vitvitskyi <EMAIL>.
Pseudocode	Yes	edges 1 j i 1 while j 0 edges < budget do ρ U(0, 1) if ρ > p then E E {(j, i)} edges edges + 1 end if j j 1 end while
Open Source Code	No	The paper does not explicitly state that source code for the methodology is provided, nor does it include any links to a code repository.
Open Datasets	Yes	As an indication of the utility of various feedforward graphs in the setting where nodes correspond to natural language tokens, we also fine-tuned Gemma 2B (Team et al., 2024a) utilising these graphs as attention masks across all Transformer layers on the standard Wikipedia dataset4 containing texts obtained from Wikipedia database dumps. 4https://www.tensorflow.org/datasets/catalog/wikipedia
Dataset Splits	No	The paper mentions training on lengths up to 256 and testing on sequences up to 1,024 elements, and details training steps and batch sizes. However, it does not provide specific train/validation/test dataset splits (e.g., percentages or sample counts) for any of the tasks or datasets used.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions optimizers like Adam W, La Prop (Ziyin et al., 2021), and RMSClip (Shazeer & Stern, 2018), and provides their hyperparameters. However, it does not specify version numbers for any key software components or libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	For maximum tasks... We use the cross-entropy loss function and the Adam W optimiser with 10 3 learning rate and 256 batch size. Training is performed over 10, 000 steps. For Parity... with only 8 layers and using standard multi-head attention with 8 heads, the vocabulary size was 2... and the embedding dimension was 256. To train the model we used the La Prop optimizer... for 1 million steps, with batch size of 128 sequences. Our hyperparameters were: Learning Rate 1 10 3, β1 0.9, β2 0.9, Weight Decay 5 10 4, RMSClip s d 1.