reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Boost Self-Supervised Dataset Distillation via Parameterization, Predefined Augmentation, and Approximation

Authors: Sheng-Feng Yu, Jia-Jiun Yao, Wei-Chen Chiu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments conducted on various datasets validate the superiority of our approach in terms of distillation efficiency, cross-architecture generalization, and transfer learning performance.
Researcher Affiliation	Collaboration	National Yang Ming Chiao Tung University1, Macronix International Co., Ltd.2 EMAIL, EMAIL
Pseudocode	Yes	Here we provide the pseudo code (i.e. Algorithm 1) together with the detailed but compact explanation to emphasize the systematic approach of our proposed method for self-supervised dataset distillation, which begins with initializing the framework and proceeds through a bilevel optimization process, ending with the training of approximation networks to capture representation shifts due to the augmentations (e.g. rotations).
Open Source Code	No	The paper mentions using 'solo-learn library (da Costa et al., 2022)' for training the teacher model, but it does not explicitly state that the authors' own implementation code for the methodology described in the paper is open-source or provide a link to it.
Open Datasets	Yes	Datasets. CIFAR100 (Krizhevsky, 2009), Tiny Image Net (Le & Yang, 2015), and Image Net (Deng et al., 2009) are taken as our source datasets for performing self-supervised DD, while the distilled dataset is evaluated upon the target datasets (which include the source datasets themselves, CIFAR10 (Krizhevsky, 2009), CUB2011 (Wah et al., 2011), and Stanford Dogs (Khosla et al., 2011)).
Dataset Splits	Yes	The distilled dataset is evaluated upon the target datasets (which include the source datasets themselves, CIFAR10 (Krizhevsky, 2009), CUB2011 (Wah et al., 2011), and Stanford Dogs (Khosla et al., 2011) for the classification). ... The goal of our distilled dataset (...) is for further use of training a new model (...) to mimic the characteristics of the selfsupervisedly pretrained teacher model gϕ, its evaluation follows the typical linear evaluation scheme of self-supervised learning works: the new model (...) learnt from (...) is frozen and coupled with a linear classifier, where the linear classifier is trained upon the supervised dataset of a downstream task.
Hardware Specification	Yes	Computational cost of distilling CIFAR-100 with storage buffer N = 100 using a single Nvidia RTX 4090 GPU card.
Software Dependencies	No	The inner model adopted in our approach utilizes convolutional layers that include batch normalization (Ioffe & Szegedy, 2015), Re LU activation, and average pooling. ... To optimize our distilled dataset, we employ the Adam W optimizer (Loshchilov & Hutter, 2019)... The Res Net18 model (He et al., 2016) is serving as a self-supervised teacher gϕ and is trained with the Barlow Twins objective (Zbontar et al., 2021) (where the training is based on the solo-learn library (da Costa et al., 2022)). The paper mentions software components and libraries used but does not specify their version numbers.
Experiment Setup	Yes	The model pool for inner models (...) consists of 10 models, which are initialized and updated via full-batch gradient descent, with learning rate and momentum set to 0.1, 0.9, respectively. The update steps Z are 1,000. To optimize our distilled dataset, we employ the Adam W optimizer (...), starting with a learning rate of 0.001 that linearly decayed. This distillation process involves 30,000 outer iterations for CIFAR100 and 20,000 for Tiny Image Net and Image Net. ... Upon completion of the distillation process and stepping forward to evaluation, we pretrain a model (...) on the distilled dataset for 1,000 epochs. This pretraining employs a stochastic gradient descent (SGD) optimizer with a mini-batch size of 256, where the learning rate and momentum are maintained at 0.1 and 0.9, respectively. The weight decay parameters during pretraining the feature extractor are listed in Table 5, we set the weight decay parameters which depends on the size of distilled dataset. For training the linear classifier to conduct linear evaluation, we standardize the experimental settings to utilize the SGD optimizer with a momentum of 0.9, excluding weight decay, and initiate the learning rate of taskspecific head to 0.2 with cosine scheduling.