ELFS: Label-Free Coreset Selection with Proxy Training Dynamics
Authors: Haizhong Zheng, Elisa Tsai, Yifu Lu, Jiachen Sun, Brian Bartoldson, Bhavya Kailkhura, Atul Prakash
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate ELFS on four vision benchmarks and show that, given the same vision encoder, ELFS consistently outperforms SOTA label-free baselines. For instance, when using Sw AV as the encoder, ELFS outperforms D2 by up to 10.2% in accuracy on Image Net-1K. |
| Researcher Affiliation | Academia | 1University of Michigan 2Lawrence Livermore National Laboratory EMAIL EMAIL |
| Pseudocode | No | The paper describes the methodology in text and illustrates a pipeline in Figure 2, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We make our code publicly available on Git Hub1. 1https://github.com/eltsai/elfs |
| Open Datasets | Yes | We evaluate ELFS on four vision benchmarks: CIFAR10, CIFAR100 (Krizhevsky et al., 2009), STL10 (Coates et al., 2011), and Image Net-1K (Deng et al., 2009). |
| Dataset Splits | Yes | After generating the pseudo-labeled dataset, we split it into 90% for training and 10% for validation. We use the validation set to determine the optimal β. |
| Hardware Specification | Yes | The grid search time for a single pruning rate is approximately 17 hours using four A6000 GPUs. |
| Software Dependencies | No | The paper mentions using 'Adam W optimizer' and 'SGD optimizer' along with a 'cosine annealing learning rate scheduler', but does not provide specific version numbers for any software libraries or frameworks like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | For pseudo-label generation, we use the training settings recommended in TEMI (Adaloglou et al., 2023). ... Training is conducted over 200 epochs with a batch size of 512, using an Adam W optimizer (Loshchilov & Hutter, 2017b) with a learning rate of 0.0001 and a weight decay of 0.0001. ... CIFAR10 and CIFAR100: We use a Res Net18 model for 40,000 iterations, with a batch size of 256 and SGD optimizer settings that include 0.9 momentum and 0.0002 weight decay. The initial learning rate is set at 0.1 with a cosine annealing learning rate scheduler (Loshchilov & Hutter, 2017a). |