Efficient Model-Agnostic Multi-Group Equivariant Networks

Authors: Razan Baltaji, Sourya Basu, Lav R. Varshney

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental For the first design, we provide experiments on multi-image classification where each view is transformed independently with transformations such as rotations. We find equivariant models are robust to such transformations and perform competitively otherwise. For the second design, we consider three applications: language compositionality on the SCAN dataset to product groups; fairness in natural language generation from GPT-2 to address intersectionality; and robust zero-shot image classification with CLIP. Overall, our methods are simple and general, competitive with equitune and its variants, while also being computationally more efficient.
Researcher Affiliation Academia Razan Baltaji , Sourya Basu , & Lav R. Varshney Department of Electrical and Computer Engineering, Coordinated Science Laboratory University of Illinois at Urbana-Champaign EMAIL
Pseudocode No The paper contains mathematical formulations and descriptions of algorithms but no explicit 'Pseudocode' or 'Algorithm' blocks with structured, code-like steps.
Open Source Code Yes Code is available at https://github.com/baltaci-r/Multi-Group-Equivariant-Networks
Open Datasets Yes We perform experiments using two datasets: Caltech101 (Li et al., 2022) and 15Scene (Fei-Fei & Perona, 2005). We work on the SCAN-II dataset where we have one train dataset and three different test dataset splits. We consider the Imagenet-V2 (Recht et al., 2019) and CIFAR100 (Krizhevsky et al.) image classification datasets. Further, in Tab. 2, we verify that Multi Equi GPT2 has a negligible drop in perplexity on the test sets of Wiki Text-2 and Wiki Text-103 compared to GPT2 and close to Equi GPT2.
Dataset Splits Yes We partition the train and test datasets for each label in tuples of N. We add random 90 rotations to the test images, and for training, we report results both with and without the transformations. We work on the SCAN-II dataset where we have one train dataset and three different test dataset splits.
Hardware Specification Yes All multi-image classification experiments were done on a single Nvidia A100 GPU with 80GB memory in a compute cluster.
Software Dependencies No The paper mentions several software components like 'SGD optimizer', 'Adam optimizer', 'BERT', 'GPT-2', 'CLIP', 'ReLU', 'batch norm', 'dropout', but does not provide specific version numbers for these or other ancillary software libraries/frameworks used for implementation.
Experiment Setup Yes We train each model for 100 epochs and a batch size of 64, an SGD optimizer with a learning rate of 0.01, momentum of 0.9, and a weight decay of 0.001. Each model was pretrained on the train set for 200k iterations using Adam optimizer (Kingma & Ba, 2015) with a learning rate of 10 4 and teacher-forcing ration 0.5 (Williams & Zipser, 1989). We test the non-equivariant pretrained models, along with equituned and multi-equituned models, where equitune and multi-equitune use further 10k iterations of training on the train set. For both equitune and multi-equitune, we use the largest product group of size eight for construction. We use the cross-entropy loss as our training objective.