Worst-case Feature Risk Minimization for Data-Efficient Learning
Authors: Jingshi Lei, Da Li, Chengming Xu, Liming Fang, Timothy Hospedales, Yanwei Fu
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate WFRM on two dataefficient learning tasks, including three standard DG benchmarks, PACS, VLCS and Office Home and the most challenging FSL benchmark Meta-Dataset. Despite the simplicity, our method consistently improves various DG and FSL methods, leading to the new state-of-the-art performances in all settings. Codes & models will be released at https://github.com/jslei/WFRM. |
| Researcher Affiliation | Collaboration | Jingshi Lei EMAIL School of Data Science, Fudan University; Da Li EMAIL Samsung AI Centre Cambridge; Chengming Xu EMAIL School of Data Science, Fudan University; Liming Fang EMAIL College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics Science and Technology on Parallel and Distributed Processing Laboratory (PDL); Timothy Hospedales EMAIL Samsung AI Centre Cambridge the University of Edinburgh; Yanwei Fu EMAIL School of Data Science, Fudan University |
| Pseudocode | No | The paper describes the methodology using mathematical formulations and descriptive text, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | Codes & models will be released at https://github.com/jslei/WFRM. |
| Open Datasets | Yes | We evaluate WFRM on two dataefficient learning tasks, including three standard DG benchmarks, PACS, VLCS and Office Home and the most challenging FSL benchmark Meta-Dataset. We use DINO Vi T-small (Caron et al., 2021) pretrained on the meta-train split of Image Net (Deng et al., 2009). To verify the efficacy of our WFRM more thoroughly, we further conduct experiments on Domain Bed benchmarks. |
| Dataset Splits | Yes | We use Res Net-18 (Image Net pretrained) as our backbone and follow the official train/val split as per (Li et al., 2017). Unless otherwise specified, we follow the train/val/test split protocol in (Wang et al., 2020a; Zhou et al., 2021) and utilize the validation set to determine the value of ρ in all DG experiments. We use Alex Net (Krizhevsky et al., 2017) (Image Net pretrained) as our backbone and follow the train/val protocols as per (Wang et al., 2020b). We use Res Net-18 (Image Net pretrained) as the backbone model and follow the train/val split in (Zhou et al., 2021). Following the experimental setup of e TT, we use DINO Vi T-small (Caron et al., 2021) pretrained on the meta-train split of Image Net (Deng et al., 2009). We plug WFRM before the final linear transformation layer and fine-tune e TT with our WFRM on the meta-test splits of all the 10 sub-datasets for evaluation. Then, we follow exactly the default settings with the training-domain validation set used for model selection. |
| Hardware Specification | Yes | We use Py Torch (Paszke et al., 2019) and run our experiments on a Ge Force GTX 1080 Ti GPU. Our experiments are run on four Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions using Py Torch but does not specify a version number for it or any other key software dependencies. |
| Experiment Setup | Yes | The network is trained with M-SGD, batch size 16, momentum 0.9, learning rate 0.002 and weight decay 0.0005 for 50 epochs. No data augmentation strategy is used but all images are resized to 224 224. During training, our WFRM is inserted after the global average pooling layer. α is set to 0.5 throughout the experiments and ρ is set to 1.5. The network is trained with M-SGD, batch size 64, momentum 0.9, learning rate 0.0002 and weight decay 0.0005 for 30 epochs. Following (Wang et al., 2020b), we use random resized cropping, horizontal flipping and color jittering for data augmentation. During training, our WFRM is inserted after the FC7 layer and ρ is set to 6.5. The network is trained with M-SGD, batch size 32, momentum 0.9, learning rate 0.001 and weight decay 0.0005 for 50 epochs. The learning rate is decayed by 0.1 at the 40th epoch. Our data augmentation strategy includes random resized cropping, horizontal flipping and color jittering. During training, WFRM is again inserted after the global average pooling layer and ρ is set to 0.45. For our algorithm-specific hyperparameters, we use the selection method in (Xu et al., 2022). And ρ and α are respectively set to 10 and 0.5 for all 10 datasets. |