reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Randomized Algorithms with Transformers

Authors: Johannes von Oswald, Seijin Kobayashi, Yassir Akram, Angelika Steger

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We then show that common optimization techniques, such as gradient descent or evolutionary strategies, can effectively learn transformer parameters that make use of the randomness provided to the model. To illustrate the broad applicability of randomization in empowering neural networks, we study three conceptual tasks: associative recall, graph coloring, and agents that explore grid worlds. In addition to demonstrating increased robustness against oblivious adversaries through learned randomization, our experiments reveal remarkable performance improvements due to the inherently random nature of the neural networks computation and predictions.
Researcher Affiliation	Collaboration	Johannes von Oswald 1, Seijin Kobayashi 1,2, Yassir Akram 2, Angelika Steger2 1 Paradigms of Intelligence Team, Google 2 Department of Computer Science, ETH Zürich
Pseudocode	No	The paper describes theoretical considerations and specific algorithmic steps in paragraph form, but it does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures with structured pseudocode.
Open Source Code	No	The paper does not include an unambiguous statement or link indicating that the source code for the methodology described is publicly available or released.
Open Datasets	No	The paper mentions 'Randomly generated binary value vectors' for associative recall, 'Random permutation of graph indices' for graph coloring, and describes the setup for grid worlds. It does not provide concrete access information (links, DOIs, specific citations with author/year, or established benchmark dataset names) for publicly available datasets used in its experiments.
Dataset Splits	No	The paper describes how the data is generated or structured for each task (e.g., 'Context size Variable size from 8-20', 'N = 10 vertices', '5x5 grid'). However, it does not explicitly provide information about dataset splits (e.g., percentages or counts for training, validation, and test sets) for reproducibility.
Hardware Specification	Yes	We provide here additional results and training details to reproduce the experiments one by one, as well as a proof of proposition 1. We give first a short overview of the compute budget used for this project. For the associative recall task as well as the graph coloring problem, we estimate a total compute budget using 4 Nvidia RTX 4090 for a month. For the grid world problem, we use a cluster of 16x A100 GPUs which we used to scan over the hyperparameters on and off over a total of 2 weeks.
Software Dependencies	No	The paper mentions the Adam optimizer (Kingma & Ba, 2015) and Parameter-exploring Policy Gradients (PEPG) (Sehnke et al., 2010), which are algorithms/methods. However, it does not list any specific software libraries or programming languages with version numbers (e.g., Python 3.8, PyTorch 1.9) required for replication.
Experiment Setup	Yes	Table 1: Hyperparameters for the associative recall task. Table 2: Hyperparameters for the graph coloring task. Table 3: Hyperparameters for the grid world task.