reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

KIND: Knowledge Integration and Diversion for Training Decomposable Models

Authors: Yucheng Xie, Fu Feng, Ruixiao Shi, Jing Wang, Yong Rui, Xin Geng

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that models pretrained with KIND can be decomposed into learngenes and tailors, which can be adaptively recombined for diverse resource-constrained deployments. Moreover, for tasks with large domain shifts, transferring only learngenes with task-agnostic knowledge, when combined with randomly initialized tailors, effectively mitigates domain shifts.
Researcher Affiliation	Collaboration	1School of Computer Science and Engineering, Southeast University, Nanjing, China 2Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China 3Lenovo Research. Correspondence to: Jing Wang <EMAIL>, Xin Geng <EMAIL>.
Pseudocode	Yes	Algorithm 1 presents the pseudo code for diverting class-agnostic knowledge into learngenes and class-specific knowledge into tailors.
Open Source Code	No	Code will be made available at https://github.com/Te4P0t/KIND.
Open Datasets	Yes	We conduct class-conditioned generation on Image Net-1K (Deng et al., 2009), which contains 1,000 classes. To minimize inter-class similarity, we merge certain similar classes based on their superclasses in Word Net (Miller, 1995), resulting in a final set of 611 classes. Among these, 150 classes are used for pre-training the diffusion models, while the remaining 461 classes serve as novel classes for constructing downstream tasks. Further details can be found in Appendix A.3. Additionally, we use datasets, including Celeb A-HQ (Huang et al., 2018), Hubble (Weinzierl, 2023), MRI, and Pok emon, to simulate large domain shifts compared to the training data.
Dataset Splits	Yes	To minimize inter-class similarity, we merge certain similar classes based on their superclasses in Word Net (Miller, 1995), resulting in a final set of 611 classes. Among these, 150 classes are used for pre-training the diffusion models, while the remaining 461 classes serve as novel classes for constructing downstream tasks. Further details can be found in Appendix A.3.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions software components like 'Adam W' optimizer but does not provide specific version numbers for any libraries, frameworks, or programming languages used for implementation.
Experiment Setup	Yes	For pre-training Di T, we train class-conditional latent Di Ts of sizes -B and -L, with a latent patch size of p = 2 at a 256 256 image resolution on training classes. All models are trained using Adam W with a batch size of 256 and a constant learning rate of 1 10 4 over 300K steps. An exponential moving average (EMA) of Di T weights is used with a decay rate of 0.9999, and results are reported using the EMA model. During image generation, a classifierfree guidance (cfg) scale of 1.5 is applied. Performance is evaluated using Fr echet Inception Distance (FID) (Heusel et al., 2017), s FID (Nash et al., 2021), Fr echet DINO distance(FDD) (Stein et al., 2023), Inception Score (Salimans et al., 2016) and Precision/Recall (Kynk a anniemi et al., 2019). Further details are provided in Appendix A.2. Table 6 presents the basic settings, including learning rate, training steps and the number of learngene components NG and tailor components NT for KIND integrating and diverting knowledge.