Meta-Learning via Classifier(-free) Diffusion Guidance
Authors: Elvis Nava, Seijin Kobayashi, Yifei Yin, Robert K. Katzschmann, Benjamin F Grewe
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that our approaches outperform existing multi-task and meta-learning methods in a series of zero-shot learning experiments on our Meta-VQA dataset. |
| Researcher Affiliation | Academia | Elvis Nava EMAIL ETH AI Center & INI & Soft Robotics Lab, ETH Zurich Seijin Kobayashi EMAIL Dept. of Computer Science, ETH Zurich Yifei Yin EMAIL Dept. of Computer Science, ETH Zurich Robert K. Katzschmann EMAIL Soft Robotics Lab, D-MAVT, ETH Zurich Benjamin F. Grewe EMAIL Institute of Neuroinformatics, University of Zurich & ETH Zurich |
| Pseudocode | Yes | Algorithm 1 Hyper CLIP Training Algorithm 2 Unconditional Multitask Training Algorithm 3 Unconditional MNet-MAML Training Algorithm 4 Unconditional HNet-MAML Training Algorithm 5 Conditional Multitask Training Algorithm 6 Conditional Multitask Fi LM Training Algorithm 7 Conditional HNet-MAML Training Algorithm 8 HVAE Training Algorithm 9 HNet + Hyper CLIP Training Algorithm 10 HVAE + Hyper CLIP Training Algorithm 11 Hyper CLIP guidance (Inference time) Algorithm 12 HNet + Hyper LDM Training Algorithm 13 HVAE + Hyper LDM Training Algorithm 14 Hyper LDM Inference |
| Open Source Code | Yes | Our code is available at https://github.com/elvisnava/hyperclip. |
| Open Datasets | Yes | We demonstrate the usefulness of our methods on Meta-VQA, our modification of the VQA v2.0 dataset (Goyal et al., 2017) built to reflect the multi-task setting with natural language task descriptors. |
| Dataset Splits | Yes | In the end, our Meta-VQA dataset is composed of 1234 unique tasks (questions), split into 870 training tasks and 373 test tasks, for a total of 104112 image-answer pairs. There are on average 9.13 answer choices per question/task. The average size of the support set is 57.85 examples, while the average size of the query set is 25.9 examples. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for its experiments. It mentions the base network model (CLIP-Adapter with Vi T-L/14@336px CLIP encoder) and its advantages for |
| Software Dependencies | No | The paper mentions several techniques and models like Adam optimizer, CLIP, PyTorch (implicitly through models), but does not provide specific version numbers for any software dependencies. For example, it cites Adam (Kingma & Ba, 2017) but not a specific version of the Adam implementation used. |
| Experiment Setup | Yes | Table 3: Hyperparameters used for the baseline methods. All methods are trained with the Adam (Kingma & Ba, 2017) optimizer, with a meta-batch size of 32 tasks. We use gradient norm clipping for all optimization, with the maximum norm set to 10. Note that when the adaptation algorithm A has a range of possible steps, the number of steps is sampled uniformly from the range for every adaptation. For HVAE + Hyper CLIP guidance and HVAE + Hyper LDM, we trained a VAE for 2000 epochs... with the Adam (Kingma & Ba, 2017) optimizer and 0.0001 learning rate and batch size 32... To train the Hyper CLIP model... We trained our Hyper CLIP model for 600 epochs with the Adam (Kingma & Ba, 2017) optimizer, 0.0003 learning rate, and batch size 64 for all our experiments. We parametrize the diffusion process with a linear noise schedule, β starting at 0.0001 and ending at 0.06, and 350 diffusion timesteps. For all our experiments, we train the Hyper LDM for 1000 epochs with the Adam optimizer, 0.00025 learning rate, and 128 epochs. |