Manifold Contrastive Learning with Variational Lie Group Operators

Authors: Kion Fallah, Alec Helbling, Kyle A. Johnsen, Christopher John Rozell

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate benefits in self-supervised benchmarks for image datasets, as well as a downstream semi-supervised task. In the former case, we demonstrate that the proposed methods can effectively apply manifold feature augmentations and improve learning both with and without a projection head. In the latter case, we demonstrate that feature augmentations sampled from learned Lie group operators can improve classification performance when using few labels.
Researcher Affiliation Academia Kion Fallah EMAIL Alec Helbling Kyle A. Johnsen Christopher J. Rozell ML@GT Georgia Institute of Technology Atlanta, GA 30332
Pseudocode Yes Algorithm 1 Variational Sparse Coding Input: Input positive pair zi and z i, whether to use a Soft Threshold, threshold hyper-parameter ζ, number of samples J. (µi, bi) gϕ(sg [zi z i]) for j = 1 to J do sj i µi + bi sign(ϵj i) ln 1 2 | ϵj i | if Soft Threshold then cj i sj i + sg h Tζ sj i sj i i cj i sj i end if Lj m Lm(zi, z i, cj i) end for ci arg minj Lj m
Open Source Code Yes 1Code available at https://github.com/kfallah/manifold-contrastive.
Open Datasets Yes We demonstrate the efficacy of our approach on self-supervised and semi-supervised benchmarks with image datasets (Krizhevsky, 2009; Coates et al., 2011; Deng et al., 2009).
Dataset Splits Yes We test on a variety of datasets, including CIFAR10 (Krizhevsky, 2009), STL10 (Coates et al., 2011), and Tiny Image Net (Deng et al., 2009).
Hardware Specification Yes We train all methods with a single NVIDIA A100 GPU, with an approximate runtime of 9 hours for baselines and 24 hours for Manifold CLR on Tiny Image Net.
Software Dependencies No For each dataset, we train a Res Net-18 (He et al., 2016) with the Adam W optimizer (Loshchilov & Hutter, 2019) for 1000 epochs using a batch size of 512. We set the backbone and projection head learning rate to 3.0e 3 for CIFAR10 and 2.0e 3 for STL10 and Tiny Image Net. For every model, we set the learning rate of the Lie group operators and coefficient encoder to 1.0e 3 and 1.0e 4, respectively. We set the weight decay equal to 1.0e 5 for the backbone, projection head, and coefficient encoder and equal to 1.0e 3 for the Lie group operators. We use a cosine annealing scheduler with a 10 warm-up epochs and a minimum learning rate of 1.0e 5 for all parameters.
Experiment Setup Yes For each dataset, we train a Res Net-18 (He et al., 2016) with the Adam W optimizer (Loshchilov & Hutter, 2019) for 1000 epochs using a batch size of 512. We set the backbone and projection head learning rate to 3.0e 3 for CIFAR10 and 2.0e 3 for STL10 and Tiny Image Net. For every model, we set the learning rate of the Lie group operators and coefficient encoder to 1.0e 3 and 1.0e 4, respectively. We set the weight decay equal to 1.0e 5 for the backbone, projection head, and coefficient encoder and equal to 1.0e 3 for the Lie group operators. We use a cosine annealing scheduler with a 10 warm-up epochs and a minimum learning rate of 1.0e 5 for all parameters.