ELFS: Label-Free Coreset Selection with Proxy Training Dynamics

Authors: Haizhong Zheng, Elisa Tsai, Yifu Lu, Jiachen Sun, Brian Bartoldson, Bhavya Kailkhura, Atul Prakash

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate ELFS on four vision benchmarks and show that, given the same vision encoder, ELFS consistently outperforms SOTA label-free baselines. For instance, when using Sw AV as the encoder, ELFS outperforms D2 by up to 10.2% in accuracy on Image Net-1K.
Researcher Affiliation Academia 1University of Michigan 2Lawrence Livermore National Laboratory EMAIL EMAIL
Pseudocode No The paper describes the methodology in text and illustrates a pipeline in Figure 2, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes We make our code publicly available on Git Hub1. 1https://github.com/eltsai/elfs
Open Datasets Yes We evaluate ELFS on four vision benchmarks: CIFAR10, CIFAR100 (Krizhevsky et al., 2009), STL10 (Coates et al., 2011), and Image Net-1K (Deng et al., 2009).
Dataset Splits Yes After generating the pseudo-labeled dataset, we split it into 90% for training and 10% for validation. We use the validation set to determine the optimal β.
Hardware Specification Yes The grid search time for a single pruning rate is approximately 17 hours using four A6000 GPUs.
Software Dependencies No The paper mentions using 'Adam W optimizer' and 'SGD optimizer' along with a 'cosine annealing learning rate scheduler', but does not provide specific version numbers for any software libraries or frameworks like Python, PyTorch, or TensorFlow.
Experiment Setup Yes For pseudo-label generation, we use the training settings recommended in TEMI (Adaloglou et al., 2023). ... Training is conducted over 200 epochs with a batch size of 512, using an Adam W optimizer (Loshchilov & Hutter, 2017b) with a learning rate of 0.0001 and a weight decay of 0.0001. ... CIFAR10 and CIFAR100: We use a Res Net18 model for 40,000 iterations, with a batch size of 256 and SGD optimizer settings that include 0.9 momentum and 0.0002 weight decay. The initial learning rate is set at 0.1 with a cosine annealing learning rate scheduler (Loshchilov & Hutter, 2017a).