Discrete Neural Algorithmic Reasoning
Authors: Gleb Rodionov, Liudmila Prokhorenkova
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we perform experiments to evaluate how the proposed discretization affects the performance of neural reasoners on diverse algorithmic reasoning tasks. Our main questions are: 1. Can the proposed discrete neural reasoners capture the desired algorithmic dynamics with hint supervision? 2. How does discretization affect OOD and sizegeneralization performance of neural reasoners? 3. Is the proposed model capable of multi-task learning? Additionally, we are interested in whether discrete neural reasoners can be learned without hints and how they tend to utilize the given number of node and edge states to solve the problem. We discuss no-hint experiments separately in Section 6. Finally, we investigate whether graph sizes in the train set can be reduced without compromising generalization performance; see Appendix B for details. |
| Researcher Affiliation | Industry | 1Yandex Research, Moscow, Russia 2Yandex Research, Amsterdam, The Netherlands. Correspondence to: Gleb Rodionov <EMAIL>. |
| Pseudocode | Yes | First, recall the pseudocode of the algorithm: Starting node visited All other nodes not visited for step in range(T) do for node U in a graph do if U is visited on previous steps then continue end if if U has a neighbor P that visited on previous steps then U visited on this step U selects the smallest-indexed such neighbor P as parent: Edge (U, P) pointer Self-loop (U, U) not pointer end if end for end for return a BFS tree described by pointers |
| Open Source Code | Yes | A complete list of all hyperparameters is provided in our source code.2 2https://github.com/yandex-research/dnar |
| Open Datasets | Yes | We perform our experiments on the problems from the recently proposed SALSA-CLRS benchmark (Minder et al., 2023), namely BFS, DFS, Prim, Dijkstra, Maximum Independent Set (MIS), and Eccentricity. We believe that the proposed method is not limited to the covered problems, but we leave the implementation of the required data flows (e.g., edge-based reasoning (Ibarz et al., 2022), graph-level hints, interactions between different scalars) for future work. |
| Dataset Splits | Yes | The train dataset of SALSA-CLRS consists of random graphs with at most 16 nodes sampled from the Erd os-R enyi (ER) distribution with parameter p chosen to be as low as possible such that graphs remain connected with high probability. The test set consists of sparse graphs of sizes from 16 to 1600 nodes. |
| Hardware Specification | Yes | Our models are trained on a single A100 GPU, requiring less than 1 hour for single-task and 5-6 hours for multitask training. |
| Software Dependencies | No | The paper mentions using the Adam optimizer and Gumbel-Softmax (Jang et al., 2017) and softmax temperature annealing. However, it does not provide specific version numbers for any programming languages, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | We train each model for 1000 optimization steps using the Adam optimizer with a learning rate of η = 0.001 and a batch size of 32 graphs; teacher forcing is applied throughout training. For the discrete bottlenecks (attention weights and Scalar Update operations), we anneal the softmax temperature geometrically from 3.0 to 0.01, decreasing it at each training step. A complete list of all hyperparameters is provided in our source code.2 For multitask experiments, we adopt the setup from Ibarz et al. (2022) and train a single processor with task-dependent encoders/decoders to imitate all covered algorithms simultaneously. We make 10000 optimization steps on the accumulated gradients across each task and keep all hyperparameters the same as in the single task. |