reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Neural Phylogeny: Fine-Tuning Relationship Detection among Neural Networks

Authors: Runpeng Yu, Xinchao Wang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments, ranging from shallow fully-connected networks to open-sourced Stable Diffusion and LLa MA models, progressively validate the effectiveness of both methods. The results demonstrate the reliability of both the learning-free and the learning-based approaches across various learning tasks and network architectures, as well as their ability to detect cross-generational phylogeny between ancestor models and their fine-tuned descendants.
Researcher Affiliation	Academia	Runpeng Yu, Xinchao Wang National University of Singapore EMAIL, EMAIL
Pseudocode	No	The paper describes methods and workflows but does not contain explicitly labeled pseudocode or algorithm blocks. Figure 5 provides a diagrammatic illustration of the KMeans-along method's process, but it is not pseudocode.
Open Source Code	No	The paper does not provide an explicit statement about the release of its own source code, nor does it include a link to a code repository. It mentions using 'the diffusers library to implement the Dream Booth fine-tuning and image generation for diffusion models' but this refers to a third-party tool.
Open Datasets	Yes	For fully-connected and convolutional networks, the parent models are trained by us on MNIST (Lecun et al., 1998) and CIFAR-100 (Krizhevsky et al., 2009). Parent models are finetuned on FMNIST (Xiao et al., 2017), EMNIST-Letters (Cohen et al., 2017), and CIFAR10 (Krizhevsky et al., 2009) datasets to construct the child model sets.
Dataset Splits	Yes	For phylogeny detector, we split the child models into training, validation, and testing sets in a 7:1:2 ratio and report the average accuracy over five runs on the testing set.
Hardware Specification	Yes	Experiments are conducted on one Nvidia RTX 4090.
Software Dependencies	No	The paper mentions several libraries like 'diffusers library', 'sklearn library', and 'timm library', but it does not specify their version numbers, which is necessary for reproducible software dependencies.
Experiment Setup	Yes	The training process use a learning rate of either 0.01 or 0.001, a batch size of either 256 or 1024, and parameter initialization with either Kaiming Uniform or Kaiming Normal, using eight random seeds. The number of training epochs is set to 50, and Adam optimizer is used. [...] The fine-tuning learning rate is set to one of 0.01, 0.001, or 0.0001, with a batch size of either 256 or 1024, using four random seeds. The number of fine-tuning epochs is set to 30, using the Adam optimizer. [...] The edge encoder s kernel size is set to 3, and the latent embedding size is set to 32. The training learning rate is 0.01, the batch size is 1, and the epoch is 100 for all experiments. The Adam optimizer is used.