reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Discrete Neural Algorithmic Reasoning

Authors: Gleb Rodionov, Liudmila Prokhorenkova

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we perform experiments to evaluate how the proposed discretization affects the performance of neural reasoners on diverse algorithmic reasoning tasks. Our main questions are: 1. Can the proposed discrete neural reasoners capture the desired algorithmic dynamics with hint supervision? 2. How does discretization affect OOD and sizegeneralization performance of neural reasoners? 3. Is the proposed model capable of multi-task learning? Additionally, we are interested in whether discrete neural reasoners can be learned without hints and how they tend to utilize the given number of node and edge states to solve the problem. We discuss no-hint experiments separately in Section 6. Finally, we investigate whether graph sizes in the train set can be reduced without compromising generalization performance; see Appendix B for details.
Researcher Affiliation	Industry	1Yandex Research, Moscow, Russia 2Yandex Research, Amsterdam, The Netherlands. Correspondence to: Gleb Rodionov <EMAIL>.
Pseudocode	Yes	First, recall the pseudocode of the algorithm: Starting node visited All other nodes not visited for step in range(T) do for node U in a graph do if U is visited on previous steps then continue end if if U has a neighbor P that visited on previous steps then U visited on this step U selects the smallest-indexed such neighbor P as parent: Edge (U, P) pointer Self-loop (U, U) not pointer end if end for end for return a BFS tree described by pointers
Open Source Code	Yes	A complete list of all hyperparameters is provided in our source code.2 2https://github.com/yandex-research/dnar
Open Datasets	Yes	We perform our experiments on the problems from the recently proposed SALSA-CLRS benchmark (Minder et al., 2023), namely BFS, DFS, Prim, Dijkstra, Maximum Independent Set (MIS), and Eccentricity. We believe that the proposed method is not limited to the covered problems, but we leave the implementation of the required data flows (e.g., edge-based reasoning (Ibarz et al., 2022), graph-level hints, interactions between different scalars) for future work.
Dataset Splits	Yes	The train dataset of SALSA-CLRS consists of random graphs with at most 16 nodes sampled from the Erd os-R enyi (ER) distribution with parameter p chosen to be as low as possible such that graphs remain connected with high probability. The test set consists of sparse graphs of sizes from 16 to 1600 nodes.
Hardware Specification	Yes	Our models are trained on a single A100 GPU, requiring less than 1 hour for single-task and 5-6 hours for multitask training.
Software Dependencies	No	The paper mentions using the Adam optimizer and Gumbel-Softmax (Jang et al., 2017) and softmax temperature annealing. However, it does not provide specific version numbers for any programming languages, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	We train each model for 1000 optimization steps using the Adam optimizer with a learning rate of η = 0.001 and a batch size of 32 graphs; teacher forcing is applied throughout training. For the discrete bottlenecks (attention weights and Scalar Update operations), we anneal the softmax temperature geometrically from 3.0 to 0.01, decreasing it at each training step. A complete list of all hyperparameters is provided in our source code.2 For multitask experiments, we adopt the setup from Ibarz et al. (2022) and train a single processor with task-dependent encoders/decoders to imitate all covered algorithms simultaneously. We make 10000 optimization steps on the accumulated gradients across each task and keep all hyperparameters the same as in the single task.