Discrete Neural Algorithmic Reasoning

Authors: Gleb Rodionov, Liudmila Prokhorenkova

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we perform experiments to evaluate how the proposed discretization affects the performance of neural reasoners on diverse algorithmic reasoning tasks. Our main questions are: 1. Can the proposed discrete neural reasoners capture the desired algorithmic dynamics with hint supervision? 2. How does discretization affect OOD and sizegeneralization performance of neural reasoners? 3. Is the proposed model capable of multi-task learning? Additionally, we are interested in whether discrete neural reasoners can be learned without hints and how they tend to utilize the given number of node and edge states to solve the problem. We discuss no-hint experiments separately in Section 6. Finally, we investigate whether graph sizes in the train set can be reduced without compromising generalization performance; see Appendix B for details.
Researcher Affiliation Industry 1Yandex Research, Moscow, Russia 2Yandex Research, Amsterdam, The Netherlands. Correspondence to: Gleb Rodionov <EMAIL>.
Pseudocode Yes First, recall the pseudocode of the algorithm: Starting node visited All other nodes not visited for step in range(T) do for node U in a graph do if U is visited on previous steps then continue end if if U has a neighbor P that visited on previous steps then U visited on this step U selects the smallest-indexed such neighbor P as parent: Edge (U, P) pointer Self-loop (U, U) not pointer end if end for end for return a BFS tree described by pointers
Open Source Code Yes A complete list of all hyperparameters is provided in our source code.2 2https://github.com/yandex-research/dnar
Open Datasets Yes We perform our experiments on the problems from the recently proposed SALSA-CLRS benchmark (Minder et al., 2023), namely BFS, DFS, Prim, Dijkstra, Maximum Independent Set (MIS), and Eccentricity. We believe that the proposed method is not limited to the covered problems, but we leave the implementation of the required data flows (e.g., edge-based reasoning (Ibarz et al., 2022), graph-level hints, interactions between different scalars) for future work.
Dataset Splits Yes The train dataset of SALSA-CLRS consists of random graphs with at most 16 nodes sampled from the Erd os-R enyi (ER) distribution with parameter p chosen to be as low as possible such that graphs remain connected with high probability. The test set consists of sparse graphs of sizes from 16 to 1600 nodes.
Hardware Specification Yes Our models are trained on a single A100 GPU, requiring less than 1 hour for single-task and 5-6 hours for multitask training.
Software Dependencies No The paper mentions using the Adam optimizer and Gumbel-Softmax (Jang et al., 2017) and softmax temperature annealing. However, it does not provide specific version numbers for any programming languages, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes We train each model for 1000 optimization steps using the Adam optimizer with a learning rate of η = 0.001 and a batch size of 32 graphs; teacher forcing is applied throughout training. For the discrete bottlenecks (attention weights and Scalar Update operations), we anneal the softmax temperature geometrically from 3.0 to 0.01, decreasing it at each training step. A complete list of all hyperparameters is provided in our source code.2 For multitask experiments, we adopt the setup from Ibarz et al. (2022) and train a single processor with task-dependent encoders/decoders to imitate all covered algorithms simultaneously. We make 10000 optimization steps on the accumulated gradients across each task and keep all hyperparameters the same as in the single task.