reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Diffusion-based Neural Network Weights Generation

Authors: Bedionita Soro, Bruno Andreis, Hayeon Lee, Wonyong Jeong, Song Chong, Frank Hutter, Sung Ju Hwang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate D2NWG across multiple experimental settings. On in-distribution tasks, our framework achieves performance that is on par with or superior to conventional pretrained models, while also serving as an effective initialization strategy for novel domains, resulting in faster convergence and a 6% improvement in few-shot learning scenarios. Extensive ablation studies further indicate that our approach scales robustly with increased diversity and volume of pretrained models.
Researcher Affiliation	Collaboration	Soro Bedionita1 Bruno Andreis1 Hayeon Lee1 Wonyong Jeong3 Song Chong1 Frank Hutter2 Sung Ju Hwang1,3 1KAIST 2University of Freiburg 3Deep Auto.ai Correspondence to: Soro Bedionita <EMAIL>, Bruno Andreis <EMAIL>, Hayeon Lee <EMAIL>, Wonyong Jeong <EMAIL>, Song Chong <EMAIL>, Frank Hutter <EMAIL>, Sung Ju Hwang <EMAIL>
Pseudocode	Yes	Algorithm 1 Sequential Weight Model Improvement 1: Input: Initial weights Θinit = { θ1, . . . , θL}, Hypernetwork Hi for each layer i, Validation dataset Dval, K candidates per layer 2: Output: Final weights Θ = {θ 1, . . . , θ L} 3: Initialize Θ = Θinit 4: Compute initial validation accuracy: current_accuracy = A(Θinit, Dval) 5: for each layer i = 1 to L do
Open Source Code	No	The paper does not provide an explicit statement or link to their own source code for the methodology described. It only references a third-party GitHub repository for pretrained model architectures and parameter counts in Section C.2.
Open Datasets	Yes	We utilize the mini-Image Net and tiered-Image Net datasets for this task. We partitioned Image Net-1k into 20k subsets... We collected 30 real-world datasets(Ullah et al., 2022) We used the modelzoo of Schürholt et al. (2022c) consisting of a Conv Net trained on MNIST, SVHN, CIFAR-10 and STL-10. We use six tasks from the GLUE benchmark... We evaluate on several benchmarks(Beeching et al., 2023): AI2 Reasoning Challenge for grade-school science questions, Hella Swag for commonsense inference, Winogrande for commonsense reasoning. We evaluate the robustnets of ours best models on the open-lm leaderboard (Fourrier et al., 2024).
Dataset Splits	Yes	We evaluate the performance on 600 subsets from the unseen test split for 1-shot and 5-shot. ... while the number of images per class in the query set is fixed to 15 for all methods and 600 tasks are used for testing. We partitioned Image Net-1k into 20k subsets of 10 to 50 classes each with 50 images per class per subset... ...where 70% of the resulting modelzoo is used for training and 15% for validation and testing respectively.
Hardware Specification	Yes	All experiments use a single Titan RTX GPU except experiment with LLMs which used a single A100 GPU.
Software Dependencies	No	The paper mentions optimizers like Adam and AdamW (Rombach et al., 2021) and a linear scheduler, but it does not specify software dependencies with version numbers (e.g., PyTorch 1.9, Python 3.8).
Experiment Setup	Yes	The VAE and the dataset encoder are trained using the Adam optimizer with a learning rate of 1e 4. The diffusion model in each experiment is trained with a linear scheduler, a base learning rate of 1e-4, and the Adam W optimizer (Rombach et al., 2021). Table 8: Parameters Values Epochs [50, 2000] Optimizer Adam Learning Rate 1e-3 Latent Dimiension 1024 KL-Divergence Weight 1e-6