Towards Better Out-of-Distribution Generalization of Neural Algorithmic Reasoning Tasks

Authors: Sadegh Mahdavi, Kevin Swersky, Thomas Kipf, Milad Hashemi, Christos Thrampoulidis, Renjie Liao

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental First, we argue that OOD generalization in this setting is significantly different than common OOD settings. For example, some phenomena in OOD generalization of image classifications such as accuracy on the line are not observed here, and techniques such as data augmentation methods do not help as assumptions underlying many augmentation techniques are often violated. Second, we analyze the main challenges (e.g., input distribution shift, non-representative data generation, and uninformative validation metrics) of the current leading benchmark, i.e., CLRS (Veličković et al., 2021), which contains 30 algorithmic reasoning tasks. We propose several solutions, including a simple-yet-effective fix to the input distribution shift and improved data generation. Finally, we propose an attention-based 2WL-graph neural network (GNN) processor which complements message-passing GNNs so their combination outperforms the state-of-the-art model by a 3% margin averaged over all algorithms. Our code is available at: https://github.com/smahdavi4/clrs.
Researcher Affiliation Collaboration Sadegh Mahdavi EMAIL University of British Columbia Kevin Swersky EMAIL Google Research, Brain Team Thomas Kipf EMAIL Google Research, Brain Team Milad Hashemi EMAIL Google Research, Brain Team Christos Thrampoulidis EMAIL University of British Columbia Renjie Liao EMAIL University of British Columbia
Pseudocode No The paper describes the 'Encode-Process-Decode' framework and provides a simplified formula (Eq 1) for the processor: 'H(t+1) = GNN([ H(t), Zinp ], A), (1)'. It explains the process for Breadth-first Search (BFS) in detail. However, it does not contain a distinct, clearly labeled 'Pseudocode' or 'Algorithm' block with structured steps in a code-like format.
Open Source Code Yes Our code is available at: https://github.com/smahdavi4/clrs.
Open Datasets Yes Second, we analyze the main challenges (e.g., input distribution shift, non-representative data generation, and uninformative validation metrics) of the current leading benchmark, i.e., CLRS (Veličković et al., 2021), which contains 30 algorithmic reasoning tasks.
Dataset Splits Yes Table 3: Summary of dataset configurations used in this paper. Dataset Lengths Dataset Size Dataset Generation Train/Val Test Train Val/Test CLRS(Veličković et al., 2021) 16 64 1000 32 Erdős Rényi graphs with fixed p L-CLRS 16 64 100 000 32 Erdős Rényi graphs with fixed p L-CLRS-Len 16 32 100 000 1000 K-Regular graphs with K = 4 L-CLRS-Deg 32 32 100 000 1000 K-Regular graphs with K = 4 for train/val, and K = 8 for test L-CLRS-Len-Deg 16 32 100 000 1000 K-Regular graphs with K = 4 for train/val, and K = 8 for test
Hardware Specification No Acknowledgments This work was funded, in part, by NSERC DG Grants (No. RGPIN-2022-04636 and No. RGPIN-2021-03677), Google Cloud credits, and Oracle Cloud credits. Resources used in preparing this research were provided, in part, by Google, Advanced Research Computing at the University of British Columbia, the Oracle for Research program, and the Digital Research Alliance of Canada (alliancecan.ca). The paper mentions using 'Google Cloud credits' and 'Oracle Cloud credits' and other computing resources, but it does not specify any particular GPU models, CPU types, or memory configurations used for the experiments.
Software Dependencies No For the rest of the experiments, we use a batch size of 32 for message-passing processors and 16 for high-order processors, gradient clipping to a norm 1.0, a learning rate of 0.0001, 20 000 optimization steps, and cosine annealing scheduler. The paper describes various hyperparameters and training strategies but does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup Yes For the rest of the experiments, we use a batch size of 32 for message-passing processors and 16 for high-order processors, gradient clipping to a norm 1.0, a learning rate of 0.0001, 20 000 optimization steps, and cosine annealing scheduler. Moreover, as the number of processing steps is meaningless in the no-hint version, we set this number to 32 (i.e. 32 recurrent steps of message passing).