reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Model Lineage Closeness Analysis

Authors: Chen Tang, Lan Zhang, Qi Zhao, Xirong Zhuang, Xiang-Yang Li

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, comprehensive experiments show that our design achieves an impressive 97% accuracy in lineage determination and can precisely measure model lineage closeness for different modifications.
Researcher Affiliation	Academia	1University of Science and Technology of China, China 2Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, China EMAIL, EMAIL
Pseudocode	No	The paper describes methods using mathematical formulations and a workflow diagram (Fig. 2) but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	We make our code and this benchmark open-source.1 https://github.com/chentangUSTCCS/Model Lineage Closeness
Open Datasets	Yes	Specifically, we used four datasets: MNIST (Le Cun et al. 1998), CIFAR-10 (Krizhevsky, Hinton et al. 2009), Flower102 (Nilsback and Zisserman 2008) and SDog120 (Khosla et al. 2011), four model structures: Le Net (Le Cun et al. 1998), VGG16 (Simonyan and Zisserman 2015b), Mobile Netv2 (Sandler et al. 2018) and Res Net18 (He et al. 2016), to train two sets of models.
Dataset Splits	No	The paper discusses generating samples for lineage determination and a sampling method for a test set, but it does not provide specific training/test/validation splits for the datasets (e.g., MNIST, CIFAR-10) used to train the base models.
Hardware Specification	Yes	Moreover, all experiments are conducted on a Linux Server with 1 Tesla P100 GPU and implemented with Py Torch 1.5 using Python 3.7.
Software Dependencies	Yes	Moreover, all experiments are conducted on a Linux Server with 1 Tesla P100 GPU and implemented with Py Torch 1.5 using Python 3.7.
Experiment Setup	Yes	To determine the threshold, we generate 4 lineage models and 2 no lineage models for each source model and calculate the lineage closeness score. Then we set the threshold δ to 0.35 which can well distinguish generated lineage models and no lineage models.