reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

An LLM-Empowered Adaptive Evolutionary Algorithm for Multi-Component Deep Learning Systems

Authors: Haoxiang Tian, Xingshuo Han, Guoquan Wu, An Guo, Yuan Zhou, Jie Zhang, Shuo Li, Jun Wei, Tianwei Zhang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach in finding safety violations of MCDL systems, and compare its performance with state-of-the-art MOEA methods. Experimental results show that our approach can significantly improve the efficiency and diversity of the evolutionary search.
Researcher Affiliation	Collaboration	Haoxiang Tian1,2, Xingshuo Han 3, Guoquan Wu 1,4, An Guo5, Yuan Zhou6, Jie Zhang7, Shuo Li1, Jun Wei1,4, Tianwei Zhang2 1Key Lab of System Software at CAS , State Key Lab of Computer Science at ISCAS , University of CAS, Beijing 2 Nanyang Technological University, Singapore 3 Continental-NTU Corporate Lab, Singapore 4Nanjing Institute of Software Technology, University of CAS, Nanjing 5Nanjing University, Nanjing 6Zhejiang Sci-Tech University, Hangzhou 7CFAR and IHPC, ASTAR, Singapore
Pseudocode	Yes	Algorithm 1: LLM-empowered adaptive evolutionary search
Open Source Code	No	No explicit statement or link for the code of µMOEA is provided. The paper mentions using Baidu Apollo (github.com/Apollo Auto/apollo) and SORA-SVL (Huai 2023), which are third-party tools, not the authors' implementation.
Open Datasets	No	We select the industrial full-stack ADS, Baidu Apollo (Baidu Apollo 2013) to evaluate the ability of µMOEA in finding safety violations of MCDL systems, due to the representativeness, practicality and advancedness. ... San Francisco map are selected to execute the generated solutions. No explicit access information (link, DOI, etc.) for a dataset is provided for experimental data, beyond citing the Baidu Apollo platform and mentioning a map.
Dataset Splits	No	The paper describes an evolutionary search process to detect safety violations in an autonomous driving system (Baidu Apollo) using a simulator. It does not involve traditional machine learning dataset splits for training, validation, or testing, as it evaluates the system directly rather than training a model on a dataset.
Hardware Specification	Yes	We conducted the experiments on Ubuntu 20.04 with 500 GB memory, an Intel Core i7 CPU, and an NVIDIA GTX2080 TI.
Software Dependencies	No	The paper mentions 'Ubuntu 20.04' and 'SORA-SVL (Huai 2023) (an end-to-end AV simulation platform which supports connection with Apollo)', but does not provide specific version numbers for critical software libraries or dependencies used for their methodology.
Experiment Setup	Yes	We run µMOEA for 24 hours to detect safety violations of Apollo. ... For each run, on average, 3756 solutions (min 3346 and max 4015) are generated by µMOEA ... To disrupt the solutions with above-average fitness values to search the spaces for the region with global optimum, and ensure that all solutions with subaverage fitness values compulsorily undergo mutation, we use a value of 0.6 for k1 and k2, and 1.0 for k3 and k4. ... For the sake of fairness, in each 24-hour running, the number of individuals in each generation of MOSAT and µMOEA are the same.