reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Self-Supervised Transformers as Iterative Solution Improvers for Constraint Satisfaction

Authors: Yudong Xu, Wenhao Li, Scott Sanner, Elias Boutros Khalil

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on Sudoku, Graph Coloring, Nurse Rostering, and MAXCUT demonstrate that our method can tackle out-of-distribution CSPs simply through additional iterations.
Researcher Affiliation	Collaboration	1Department of Mechanical & Industrial Engineering, University of Toronto 2Vector Institute for Artificial Intelligence 3Scale AI Research Chair in Data-Driven Algorithms for Modern Supply Chains.
Pseudocode	No	The paper describes the architecture and methods in textual form with accompanying mathematical equations and diagrams, but it does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	Our implementation is available on Git Hub1. 1https://github.com/khalil-research/ Cons Former
Open Datasets	Yes	We use the dataset from SATNet (Wang et al., 2019), which contains instances with [31, 42] missing values, as the training and in-distribution testing dataset. To test our model s ability to generalize to harder out-of-distribution instances, we use the dataset from RRN (Palm et al., 2018) which contains instances with [17, 34] missing values. ... Following the same setup as T onshoff et al. (2023), we generate random graphs with 100 vertices for training and evaluate generalization on benchmark instances from the GSET dataset (Ye, 2003).
Dataset Splits	Yes	The training set contains 9K instances for all problems. ... The in-distribution instances consisted of 9000 training instances and 1000 test instances. Out-of-distribution instances also had 1000 samples.
Hardware Specification	Yes	Our models were trained on various single-core GPU nodes, including P100, V100, and T4.
Software Dependencies	No	For all models, we used Adam W as the optimizer and applied a dropout of 0.1, with learning rate set to 0.0001. The paper mentions Network X and OR-Tools as baselines or tools used for comparison but does not specify versions for any software components used in their own implementation.
Experiment Setup	Yes	The final reported models were trained with a batch size of 512 for 5000 epochs. ... The hyperparameters for the best performing model for each of the problems is shown in Table 8. For all models, we used Adam W as the optimizer and applied a dropout of 0.1, with learning rate set to 0.0001.