Sketchy Moment Matching: Toward Fast and Provable Data Selection for Finetuning
Authors: Yijun Dong, Viet Hoang Phan, Xiang Pan, Qi Lei
NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we concretize the variance-bias balance via synthetic experiments and demonstrate the effectiveness of Sk MM for finetuning in real vision tasks. [...] 4 Experiments |
| Researcher Affiliation | Academia | Yijun Dong Courant Institute New York University EMAIL Hoang Phan Center of Data Science New York University EMAIL Xiang Pan Center of Data Science New York University EMAIL Qi Lei Center of Data Science New York University EMAIL |
| Pseudocode | Yes | Algorithm 3.1 Sketchy Moment Matching (Sk MM) |
| Open Source Code | Yes | Our experiment code for both the synthetic and real data is available at https://anonymous.4open.science/r/data_pruning. |
| Open Datasets | Yes | We further validate the effectiveness of Sk MM on UTKFace [76], a real-world regression dataset for age estimation. [...] Stanford Cars [77] [...] CIFAR-10. [...] We consider a set of N = 2000 samples with high-dimensional pre-trained representations ϕ(X) RN r, r = 2400, modeled by a Gaussian mixture model (GMM) |
| Dataset Splits | Yes | hyperparameter α tuning via grid search over 100 linearly spaced values in [10-2, 102] with 2-fold cross-validation. |
| Hardware Specification | Yes | All the experiments could be done with A40 or even smaller GPUs. We use 4 workers and 32 GB Memory. |
| Software Dependencies | No | The paper mentions software components like 'CLIP', 'Adam', and 'Res Net18' but does not specify their version numbers for reproducibility. For example, 'We finetune a randomly initialized classification head on top of the feature representation of CLIP [50] with Adam [75] and learning rate 10-1.' and 'For FT, we finetuning the last two layers of an Image Net-pretrained Res Net18 [84] with a learning rate of 10-2.'. |
| Experiment Setup | Yes | We finetune a randomly initialized classification head on top of the feature representation of CLIP [50] with Adam [75] and learning rate 10-1. [...] We optimize (5) via Adam [75] with constraint projection under learning rate 10-7 for 104 iterations and sample S from s N with the lowest objective value. [...] For LP, we learn the last layer over the embeddings from a CLIP-pretrained Vi T-B/32 [50] with a learning rate of 10-1. For FT, we finetuning the last two layers of an Image Net-pretrained Res Net18 [84] with a learning rate of 10-2. In both settings, we optimize via Adam for 50 epochs. |