Learngene Tells You How to Customize: Task-Aware Parameter Initialization at Flexible Scales

Authors: Jiaze Xu, Shiyu Xia, Xu Yang, Jiaqi Lv, Xin Geng

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show the superiority of TAL. Models initialized with TAL outperform those initialized using GHN method by an average of 24.39% in terms of accuracy across Decathlon datasets. We provide the code at https://github.com/mathieuxu/Task Aware-Learngene.
Researcher Affiliation Academia 1School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 2Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China. Correspondence to: Jiaqi Lv <EMAIL>.
Pseudocode No The paper includes mathematical formulations and an optimization objective, but no structured pseudocode or algorithm blocks are explicitly presented in the main text or appendices.
Open Source Code Yes We provide the code at https://github.com/mathieuxu/Task Aware-Learngene.
Open Datasets Yes The vision tasks are evaluated on the Visual Domain Decathlon Challenge (Rebuffi et al., 2017), which comprises 10 diverse datasets: (1) Image Net-1K (IN-1K)(Russakovsky et al., 2015), (2) CIFAR-100 (C100)(Krizhevsky et al., 2009), (3) Aircraft (Airc.)(Maji et al., 2013), (4) Daimler pedestrian classification (DPed)(Munder & Gavrila, 2006), (5) Describable textures (DTD)(Cimpoi et al., 2014), (6) German traffic signs (GSTR)(Stallkamp et al., 2012), (7) Omniglot (OGlt)(Lake et al., 2015), (8) SVHN(Netzer et al., 2011), (9) UCF101 Dynamic Images (UCF)(Soomro et al., 2012), and (10) Flowers102 (Flwr)(Nilsback & Zisserman, 2008).
Dataset Splits No The paper references various standard datasets such as ImageNet-1K and CIFAR-100, and mentions 'training architectures' and 'training data samples'. However, it does not explicitly provide specific percentages, sample counts, or explicit statements within the main text regarding the train/test/validation splits used for these datasets. While these are common benchmark datasets often used with standard splits, the paper does not concretely specify the splitting methodology or reference predefined splits with explicit statements within its content.
Hardware Specification Yes Training time comparison of different methods, all experiments run on an NVIDIA RTX 4090 with time measured in hours (h).
Software Dependencies No All models are trained using automatic mixed precision in Py Torch. The paper mentions Py Torch but does not specify a version number or other software dependencies with version numbers.
Experiment Setup Yes For both TAL and TAL+, we first pretrain the hypernets on Image Net-1K for 75 epochs, followed by 100 epochs of multi-task training on the Decathlon Challenge datasets. All models are trained using automatic mixed precision in Py Torch, with a cosine annealing learning rate schedule starting at lr=3e 4, weight decay λ=1e 2 and predicted parameter regularization γ =3e 5 (Knyazev et al., 2023). We use a pretrained Vi T-Base (Dosovitskiy, 2020) as the ancestry model. During multi-task training, we sample tasks using conventional temperature-based sampling (Raffel et al., 2020) with a temperature of T =2 for all methods.