reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learngene Tells You How to Customize: Task-Aware Parameter Initialization at Flexible Scales

Authors: Jiaze Xu, Shiyu Xia, Xu Yang, Jiaqi Lv, Xin Geng

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show the superiority of TAL. Models initialized with TAL outperform those initialized using GHN method by an average of 24.39% in terms of accuracy across Decathlon datasets. We provide the code at https://github.com/mathieuxu/Task Aware-Learngene.
Researcher Affiliation	Academia	1School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 2Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China. Correspondence to: Jiaqi Lv <EMAIL>.
Pseudocode	No	The paper includes mathematical formulations and an optimization objective, but no structured pseudocode or algorithm blocks are explicitly presented in the main text or appendices.
Open Source Code	Yes	We provide the code at https://github.com/mathieuxu/Task Aware-Learngene.
Open Datasets	Yes	The vision tasks are evaluated on the Visual Domain Decathlon Challenge (Rebuffi et al., 2017), which comprises 10 diverse datasets: (1) Image Net-1K (IN-1K)(Russakovsky et al., 2015), (2) CIFAR-100 (C100)(Krizhevsky et al., 2009), (3) Aircraft (Airc.)(Maji et al., 2013), (4) Daimler pedestrian classification (DPed)(Munder & Gavrila, 2006), (5) Describable textures (DTD)(Cimpoi et al., 2014), (6) German traffic signs (GSTR)(Stallkamp et al., 2012), (7) Omniglot (OGlt)(Lake et al., 2015), (8) SVHN(Netzer et al., 2011), (9) UCF101 Dynamic Images (UCF)(Soomro et al., 2012), and (10) Flowers102 (Flwr)(Nilsback & Zisserman, 2008).
Dataset Splits	No	The paper references various standard datasets such as ImageNet-1K and CIFAR-100, and mentions 'training architectures' and 'training data samples'. However, it does not explicitly provide specific percentages, sample counts, or explicit statements within the main text regarding the train/test/validation splits used for these datasets. While these are common benchmark datasets often used with standard splits, the paper does not concretely specify the splitting methodology or reference predefined splits with explicit statements within its content.
Hardware Specification	Yes	Training time comparison of different methods, all experiments run on an NVIDIA RTX 4090 with time measured in hours (h).
Software Dependencies	No	All models are trained using automatic mixed precision in Py Torch. The paper mentions Py Torch but does not specify a version number or other software dependencies with version numbers.
Experiment Setup	Yes	For both TAL and TAL+, we first pretrain the hypernets on Image Net-1K for 75 epochs, followed by 100 epochs of multi-task training on the Decathlon Challenge datasets. All models are trained using automatic mixed precision in Py Torch, with a cosine annealing learning rate schedule starting at lr=3e 4, weight decay λ=1e 2 and predicted parameter regularization γ =3e 5 (Knyazev et al., 2023). We use a pretrained Vi T-Base (Dosovitskiy, 2020) as the ancestry model. During multi-task training, we sample tasks using conventional temperature-based sampling (Raffel et al., 2020) with a temperature of T =2 for all methods.