reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

OptMATH: A Scalable Bidirectional Data Synthesis Framework for Optimization Modeling

Authors: Hongliang Lu, Zhonglin Xie, Yaoyu Wu, Can Ren, Yuxuan Chen, Zaiwen Wen

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments, we demonstrate that models of various sizes (0.5B-32B parameters) trained on Opt MATH achieve superior results on multiple modeling benchmarks, thereby validating the effectiveness and scalability of our approach.
Researcher Affiliation	Academia	1College of Engineering, Peking University 2Beijing International Center for Mathematical Research, Peking University 3School of Mathematics Science, Peking University. Correspondence to: Zaiwen Wen <EMAIL>.
Pseudocode	Yes	Algorithm 1 Feedback-Driven Problem Data Generation Require: Target complexity range [Smin, Smax], time limits [Tmin, Tmax], instance generator G, feasibility threshold Ftarget, max iterations T Ensure: Configuration Θ such that for PDi G(Θ): S(PDi) [Smin, Smax] (complexity), τi Tmax (solving time), Pr(fi = feasible) Ftarget 1: Initialize parameters via LLM: Θ0 L(prompt IC(Smin, Smax, Tmin, Tmax)) 2: for t = 1 to T do 3: Generate N PDs: {PDi}N i=1 G(Θt 1) 4: Compute metrics: S(PDi) (Eq. 4), τi (solving time), fi (feasibility) 5: Aggregate statistics: St = 1 N P S(PDi), τt = 1 N P τi, Ft = 1
Open Source Code	Yes	The Opt MATH dataset and related resources are available at https://github.com/optsuite/ Opt MATH.
Open Datasets	Yes	The Opt MATH dataset and related resources are available at https://github.com/optsuite/ Opt MATH. ... We evaluate our fine-tuned model on five benchmarks: NL4OPT(Ramamonjison et al., 2021), MAMO(Huang et al., 2024), Industry OR(Tang et al., 2024), Opti Bench(Yang et al., 2025) and our newly constructed Opt MATH-Bench.
Dataset Splits	No	The paper mentions 'Opt MATH-Train' as a training dataset and 'Opt MATH-Bench' as a benchmark, and refers to pre-existing test sets for other benchmarks (e.g., 'we selected the test set from this dataset' for NL4OPT). It also states MAMO is 'divided into two main components, Easy LP and Complex LP, containing 652 and 211 instances, respectively'. However, it does not provide specific train/validation/test percentages or counts for its own generated 'Opt MATH-Train' dataset.
Hardware Specification	No	The computational resources were supported by the Center for Intelligent Computing and Song-Shan Lake HPC Center (SSL-HPC) in Great Bay University, Dongguan, China. This statement provides general information about computational resources but lacks specific hardware details such as GPU models, CPU types, or memory specifications.
Software Dependencies	Yes	MOSEK Ap S. MOSEK Optimization Software, 2025. Version 11.0.3.
Experiment Setup	Yes	We adopt a supervised fine-tuning (SFT) approach to enhance the Auto Formulator s modeling capabilities. Specifically, we employ the Lo RA algorithm (Hu et al., 2021) for efficient parameter-efficient fine-tuning... We select the Qwen2.5 series (0.5B 32B) as our base models (Yang et al., 2024), and the hyperparameters are generally set as follows: initial learning rate of 1e-4, 1 3 epochs, Lo RA rank of 32, Lo RA alpha of 32, and Lo RA dropout of 0.1.