Self-Supervised Transformers as Iterative Solution Improvers for Constraint Satisfaction

Authors: Yudong Xu, Wenhao Li, Scott Sanner, Elias Boutros Khalil

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on Sudoku, Graph Coloring, Nurse Rostering, and MAXCUT demonstrate that our method can tackle out-of-distribution CSPs simply through additional iterations.
Researcher Affiliation Collaboration 1Department of Mechanical & Industrial Engineering, University of Toronto 2Vector Institute for Artificial Intelligence 3Scale AI Research Chair in Data-Driven Algorithms for Modern Supply Chains.
Pseudocode No The paper describes the architecture and methods in textual form with accompanying mathematical equations and diagrams, but it does not include a clearly labeled pseudocode or algorithm block.
Open Source Code Yes Our implementation is available on Git Hub1. 1https://github.com/khalil-research/ Cons Former
Open Datasets Yes We use the dataset from SATNet (Wang et al., 2019), which contains instances with [31, 42] missing values, as the training and in-distribution testing dataset. To test our model s ability to generalize to harder out-of-distribution instances, we use the dataset from RRN (Palm et al., 2018) which contains instances with [17, 34] missing values. ... Following the same setup as T onshoff et al. (2023), we generate random graphs with 100 vertices for training and evaluate generalization on benchmark instances from the GSET dataset (Ye, 2003).
Dataset Splits Yes The training set contains 9K instances for all problems. ... The in-distribution instances consisted of 9000 training instances and 1000 test instances. Out-of-distribution instances also had 1000 samples.
Hardware Specification Yes Our models were trained on various single-core GPU nodes, including P100, V100, and T4.
Software Dependencies No For all models, we used Adam W as the optimizer and applied a dropout of 0.1, with learning rate set to 0.0001. The paper mentions Network X and OR-Tools as baselines or tools used for comparison but does not specify versions for any software components used in their own implementation.
Experiment Setup Yes The final reported models were trained with a batch size of 512 for 5000 epochs. ... The hyperparameters for the best performing model for each of the problems is shown in Table 8. For all models, we used Adam W as the optimizer and applied a dropout of 0.1, with learning rate set to 0.0001.