reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

EquivaMap: Leveraging LLMs for Automatic Equivalence Checking of Optimization Formulations

Authors: Haotian Zhai, Connor Lawless, Ellen Vitercik, Liu Leqi

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate our approach, we construct Equiva Formulation, the first open-source dataset of equivalent optimization formulations, generated by applying transformations such as adding slack variables or valid inequalities to existing formulations. Empirically, Equiva Map significantly outperforms existing methods, achieving substantial improvements in correctly identifying formulation equivalence.1
Researcher Affiliation	Academia	1The University of Texas at Austin, TX, USA 2Stanford University, CA, USA.
Pseudocode	Yes	Algorithm 1 Equiva Map
Open Source Code	Yes	1The code and datasets are available at https: //github.com/Humain Lab/Equiva Map and https://huggingface.co/datasets/humainlab/ Equiva Formulation.
Open Datasets	Yes	To evaluate our approach, we construct Equiva Formulation, the first open-source dataset of equivalent optimization formulations... 1The code and datasets are available at https: //github.com/Humain Lab/Equiva Map and https://huggingface.co/datasets/humainlab/ Equiva Formulation. We construct Equiva Formulation based on the NLP4LP dataset (Ahmadi Teshnizi et al., 2024).
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits for reproducing the experiment. It describes generating a dataset and then evaluating methods on it without specifying how that dataset itself is split for learning or evaluation in a traditional train/test sense. It measures accuracy on the overall generated dataset.
Hardware Specification	No	The paper does not explicitly mention specific hardware (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions using 'GPT-4' and 'GPT-4o' as LLMs and 'Gurobi' as an MILP solver, but it does not specify explicit version numbers for these software components or any other libraries.
Experiment Setup	Yes	We use GPT-4 (Achiam et al., 2023) as the mapping finder in Equiva Map... We set K = 3, and report the accuracy as the percentage of paired formulations α and α that are correctly identified as equivalent or nonequivalent