Measuring Orthogonality in Representations of Generative Models
Authors: Robin C. Geyer, Alessandro Torcinovich, João B. S. Carvalho, Alexander Meyer, Joachim M. Buhmann
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Throughout extensive experiments on common downstream tasks, over several benchmark datasets and models, IWO and IWR consistently show stronger correlations with downstream task performance than traditional disentanglement metrics. |
| Researcher Affiliation | Academia | Robin C. Geyer EMAIL Department of Computer Science, ETH Zurich Alessandro Torcinovich EMAIL Faculty of Engineering, Free University of Bozen-Bolzano Department of Computer Science, ETH Zurich João B. Carvalho EMAIL Department of Computer Science, ETH Zurich Alexander Meyer EMAIL German Heart Center of the Charité Joachim M. Buhmann EMAIL Department of Computer Science, ETH Zurich |
| Pseudocode | Yes | This process is depicted in Figure 2, and a pseudocode implementation can be found in Appendix E. |
| Open Source Code | Yes | More details are listed in the appendix and in our open-source code implementation1. 1https://github.com/cyrusgeyer/iwo |
| Open Datasets | Yes | We consider six benchmark datasets, namely d Sprites, Color d Sprites and Scream d Sprites (Matthey et al., 2017), Cars3D (Reed et al., 2015), small NORB (Le Cun et al., 2004) and Shapes3D (Burgess & Kim, 2018). |
| Dataset Splits | Yes | The dataset is split into a training set with five instances of each category and a test set with the remaining five instances. Data is split into a training (80%) and a test set (20%). During training, part of the training set is used for validation, which is in turn used as an early stopping criterion. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. While Section I mentions "No GPU was used" for GCA, this is specific to the GCA pipeline and not the broader experimental setup for training VAE models, which is a major part of the work. |
| Software Dependencies | No | The paper mentions using "the PyTorch Lightning framework" and "the Adam optimization scheme" but does not specify version numbers for PyTorch Lightning or any other key software libraries, which are necessary for reproducibility. |
| Experiment Setup | Yes | The hyperparameters considered for each model are the following: β-VAE (Higgins et al., 2017): with β {1, 2, 4, 6, 8, 16} Annealed VAE (Burgess et al., 2018): with cmax {5, 10, 25, 50, 75, 100} β-TCVAE (Chen et al., 2018): with β {1, 2, 4, 6, 8, 10} Factor-VAE (Kim & Mnih, 2018): with γ {10, 20, 30, 40, 50, 100} DIP-VAE-I (Kumar et al., 2018): with λod {1, 2, 5, 10, 20, 50} DIP-VAE-II (Kumar et al., 2018): with λod {1, 2, 5, 10, 20, 50} We further use the Adam optimization scheme as proposed by Kingma & Ba (2015) with a learning rate of 5 10 4 and a batch size of 128 for all optimizations. |