A CLIP-Powered Framework for Robust and Generalizable Data Selection

Authors: Suorong Yang, Peng Ye, Wanli Ouyang, Dongzhan Zhou, Furao Shen

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our approach consistently outperforms existing state-of-the-art baselines on various benchmark datasets. Notably, our method effectively removes noisy or damaged samples from the dataset, enabling it to achieve even higher performance with less data. This indicates that it is not only a way to accelerate training but can also improve overall data quality. The implementation is available at https: //github.com/Jackbrocp/clip-powered-data-selection. ... 4 EXPERIMENT: Performance Comparisons Consistent with prior works Xia et al. (2023); Sorscher et al. (2022), we report top-1 accuracy on CIFAR-100 and Tiny-Image Net, and top-5 accuracy on Image Net-1k.
Researcher Affiliation Academia 1 National Key Laboratory for Novel Software Technology, Nanjing University 2 Shanghai Artificial Intelligence Laboratory 3 The Chinese University of Hong Kong
Pseudocode Yes The complete workflow is outlined in Algorithm 1 in Appendix B. ... Algorithm 1 The general workflow. Input: dataset D, total number of training samples N, total number of epochs T, selection ratio sr, a threshold θ, the pretrained image and text encoders are EI and ET , the fine-tuned image and text adapters are AI and AT , respectively.
Open Source Code Yes The implementation is available at https: //github.com/Jackbrocp/clip-powered-data-selection.
Open Datasets Yes Comprehensive evaluation across various benchmark datasets demonstrates that our approach effectively improves the performance of selected datasets, especially on large-scale datasets such as Image Net-1k Deng et al. (2009). ... we evaluate the effectiveness of our proposed method on various popularly used benchmark datasets, including CIFAR-10/100 Krizhevsky et al. (2009), Tiny Image Net Chrabaszcz et al. (2017), and Image Net-1k Deng et al. (2009).
Dataset Splits Yes We evaluate the effectiveness of our proposed method on various popularly used benchmark datasets, including CIFAR-10/100 Krizhevsky et al. (2009), Tiny Image Net Chrabaszcz et al. (2017), and Image Net-1k Deng et al. (2009). ... All hyperparameters and experimental settings for training on different selected datasets are kept the same.
Hardware Specification Yes The devices used are 4 NVIDIA RTX2080TI GPUs and an Inter(R) CPU E5-2678 @ 2.50GHz. ... The devices used are 2 NVIDIA RTX2080TI GPUs and an Intel(R) CPU E5-2678 @ 2.50GHz. ... Table 3: Performance and saved costs (%) on Image Net-1k across Swin-T, Vi T-B, and Vi T-L on a 4-A100-GPU serve.
Software Dependencies No For experiments on Image Net-1k, following Xia et al. (2023); Sorscher et al. (2022); Yang et al. (2024b), the VISSL library Goyal et al. (2021) is exploited. ... The implementation is based on the public Github repository . Specifically, we utilize the Vi T-small to train on the selected datasets. https://github.com/kentaroy47/vision-transformers-cifar10
Experiment Setup Yes Parameter settings. The parameters in our proposed method can be easily set. The coefficient α is proportional to the expected selection rate sr, balancing the importance of dataset diversity, i.e., α can be set equivalent to sr. The coefficient β is set to 2 across datasets to adjust the numerical differences among loss items. For more details, please refer to Appendix C. ... C IMPLEMENTATION DETAILS: Training on the Selected Datasets Closely following previous works Xia et al. (2023); Yang et al. (2023c); Sorscher et al. (2022), for experiments on CIFAR-10/100, we adopt a batch size of 128, an SGD optimizer with a momentum of 0.9, weight decay of 5e 4, an initial learning rate of 0.1, and a total training epoch of 200. The learning rate is divided by 5 after the 60th, the 120th, and the 160th epoch. For experiments on Tiny-Image Net, we adopt a batch size of 256, an SGD optimizer with a momentum of 0.9, a weight decay of 1r-4, an initial learning rate of 0.1, and a total epoch of 90. The learning rate is divided by 10 after the 30th and the 60th epoch. For experiments on Image Net-1k, following Xia et al. (2023); Sorscher et al. (2022); Yang et al. (2024b), the VISSL library Goyal et al. (2021) is exploited. We adopt a base learning rate of 0.01, a batch size of 256, an SGD optimizer with a momentum of 0.9, a weight decay of 1e-3, and a total epoch of 105. All experiments are conducted by three individual runs with different random seeds, while on Image Net-1k, due to the huge training costs, the experiment in each case is performed once. Unless specified, the network architecture used is the Res Net-50 model. All hyperparameters and experimental settings for training on different selected datasets are kept the same. Fine-tuning the Adapters We adopt the initial learning rate of 1 10 4, the Adam optimizer with a step size of 30 epochs, a decay factor of 0.1, and a total epoch of 30. The batch size is set to 256 on CIFAR-10/100, 64 on Tiny-Image Net, and 512 on Image Net-1k. Selection Optimization We adopt an SGD optimizer with a momentum of 0.9, an initial learning rate of 1 10 3, and a total training iteration of 1 105.