DisCo: Improving Compositional Generalization in Visual Reasoning through Distribution Coverage
Authors: Joy Hsu, Jiayuan Mao, Jiajun Wu
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply Dis Co to visual question answering, with three backbone networks (Fi LM, Tb D-net, and the Neuro-Symbolic Concept Learner), and demonstrate that it consistently enhances performance on a variety of compositional generalization tasks with varying levels of train data bias. |
| Researcher Affiliation | Academia | Joy Hsu EMAIL Department of Computer Science, Stanford University Jiayuan Mao EMAIL Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology Jiajun Wu EMAIL Department of Computer Science, Stanford University |
| Pseudocode | Yes | Algorithm 1 The Dis Co framework described in Section 3.2. |
| Open Source Code | Yes | Code for Dis Co with the Fi LM model can be found: https://github.com/joyhsu0504/disco, based on the Fi LM codebase (https://github.com/ethanjperez/film). |
| Open Datasets | Yes | In addition to the original CLEVR compositional generalization (Co Gen T) dataset 1 (Johnson et al., 2017) (released under the CC BY 4.0 license), we also report results on multiple Co Gen datasets based on CLEVR. |
| Dataset Splits | Yes | In our construction, the train set of Co Gen split A consists of 8,000 images, and the validation set of Co Gen split A and the test set of Co Gen split B consist of 2,000 images each. The larger, unseen test set consists of 8,000 images. |
| Hardware Specification | Yes | All models are trained on a single Titan RTX GPU. |
| Software Dependencies | No | The paper mentions software components like Style GAN2 and Adam optimizer, and base implementations for VAE and SimCLR, but it does not specify version numbers for general software dependencies like Python, PyTorch, or TensorFlow, nor for the specific implementations or optimizers. |
| Experiment Setup | Yes | The GAN image proposal function is the unconditional Style GAN2 (Karras et al., 2020), trained with the Adam optimizer of learning rate 0.002. We set our entropy threshold n to be at the 30th percentile. |