reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

NS4S: Neighborhood Search for Scheduling Problems Via Large Language Models

Authors: Junjie Zhang, Canhui Luo, Zhouxing Su, Qingyun Zhang, Zhipeng Lü, Junwen Ding, Yan Jin

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments are conducted on 349 benchmark instances across three classical scheduling problems. The results demonstrate that our algorithm significantly outperforms existing state-of-the-art methods. For JSP, our algorithm reduces the average optimality gap from 10.46% to 1.35% on Taillard s instances compared to reinforced adaptive staircase curriculum learning. For flexible JSP (FJSP), it reduces the gap from 13.24% to 0.05% on Brandimarte s instances compared to deep reinforcement learning methods. Furthermore, for FJSP with sequence dependent setup time, our algorithm updates 9 upper bounds for benchmark instances.
Researcher Affiliation	Academia	School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 The main framework of the NS4S algorithm Input: JSP Instance Output: the best solution found so far S 1: S S Init(), 2: f S.makespan 3: while stopping condition is not met do 4: move(o , k) LLMGuided Neighbor Eval(S) /Section 3.2/ 5: S S move(o , k) 6: f S.makespan 7: Weight Update By LLMs() /Section 3.3/ 8: if f > f then 9: S S, f f 10: end if 11: end while 12: Return S
Open Source Code	Yes	NS4S not only surpasses the Memetic Algorithm (MA) in terms of both solution quality and computational efficiency but also establishes nine new upper bounds across the benchmark instances1. 1https://github.com/Zoommy/NS4S
Open Datasets	Yes	Benchmark Datasets: To evaluate the efficacy of the proposed NS4S algorithm, we conducted comprehensive experiments across three classical scheduling problems: JSP, FJSP, and FJSP-SDST. For JSP, we utilized the TA and DMU benchmark instances, comprising 160 instances that represent the most challenging publicly available test cases. For FJSP, we employed the BR, HU, and Vdata benchmark suites, encompassing 169 diverse instances. For FJSP-SDST, we employed the SDST-Hudata benchmark set, containing 20 instances of varying complexity.
Dataset Splits	No	The paper discusses various benchmark instances for evaluation (TA, DMU, BR, HU, Vdata, SDST-Hudata) and mentions that the Ve Evo framework uses "randomly generated instances" for training its heuristics. However, it does not specify how these benchmark instances themselves are split into training, validation, or test sets for the overall experimental evaluation, nor does it provide details on the generation of the "randomly generated instances" used for training Ve Evo.
Hardware Specification	Yes	Our implementation combines Python for the Ve Evo framework and C++ for the neighborhood search algorithm, executed on a single CPU E5-2697v3.
Software Dependencies	No	Our implementation combines Python for the Ve Evo framework and C++ for the neighborhood search algorithm, executed on a single CPU E5-2697v3. The paper mentions the programming languages used (Python and C++) but does not provide specific version numbers for these languages or any libraries, frameworks, or solvers utilized.
Experiment Setup	Yes	To ensure fair comparison with Ye et al. [2024], we configured Ve Evo with a population size of 10 and the maximum evaluations of 100. We imposed cutoff times of 10 seconds for JSP and FJSP, and 30 seconds for FJSP-SDST.