Learngene Tells You How to Customize: Task-Aware Parameter Initialization at Flexible Scales
Authors: Jiaze Xu, Shiyu Xia, Xu Yang, Jiaqi Lv, Xin Geng
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show the superiority of TAL. Models initialized with TAL outperform those initialized using GHN method by an average of 24.39% in terms of accuracy across Decathlon datasets. We provide the code at https://github.com/mathieuxu/Task Aware-Learngene. |
| Researcher Affiliation | Academia | 1School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 2Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China. Correspondence to: Jiaqi Lv <EMAIL>. |
| Pseudocode | No | The paper includes mathematical formulations and an optimization objective, but no structured pseudocode or algorithm blocks are explicitly presented in the main text or appendices. |
| Open Source Code | Yes | We provide the code at https://github.com/mathieuxu/Task Aware-Learngene. |
| Open Datasets | Yes | The vision tasks are evaluated on the Visual Domain Decathlon Challenge (Rebuffi et al., 2017), which comprises 10 diverse datasets: (1) Image Net-1K (IN-1K)(Russakovsky et al., 2015), (2) CIFAR-100 (C100)(Krizhevsky et al., 2009), (3) Aircraft (Airc.)(Maji et al., 2013), (4) Daimler pedestrian classification (DPed)(Munder & Gavrila, 2006), (5) Describable textures (DTD)(Cimpoi et al., 2014), (6) German traffic signs (GSTR)(Stallkamp et al., 2012), (7) Omniglot (OGlt)(Lake et al., 2015), (8) SVHN(Netzer et al., 2011), (9) UCF101 Dynamic Images (UCF)(Soomro et al., 2012), and (10) Flowers102 (Flwr)(Nilsback & Zisserman, 2008). |
| Dataset Splits | No | The paper references various standard datasets such as ImageNet-1K and CIFAR-100, and mentions 'training architectures' and 'training data samples'. However, it does not explicitly provide specific percentages, sample counts, or explicit statements within the main text regarding the train/test/validation splits used for these datasets. While these are common benchmark datasets often used with standard splits, the paper does not concretely specify the splitting methodology or reference predefined splits with explicit statements within its content. |
| Hardware Specification | Yes | Training time comparison of different methods, all experiments run on an NVIDIA RTX 4090 with time measured in hours (h). |
| Software Dependencies | No | All models are trained using automatic mixed precision in Py Torch. The paper mentions Py Torch but does not specify a version number or other software dependencies with version numbers. |
| Experiment Setup | Yes | For both TAL and TAL+, we first pretrain the hypernets on Image Net-1K for 75 epochs, followed by 100 epochs of multi-task training on the Decathlon Challenge datasets. All models are trained using automatic mixed precision in Py Torch, with a cosine annealing learning rate schedule starting at lr=3e 4, weight decay λ=1e 2 and predicted parameter regularization γ =3e 5 (Knyazev et al., 2023). We use a pretrained Vi T-Base (Dosovitskiy, 2020) as the ancestry model. During multi-task training, we sample tasks using conventional temperature-based sampling (Raffel et al., 2020) with a temperature of T =2 for all methods. |