Restructuring Vector Quantization with the Rotation Trick
Authors: Christopher Fifty, Ronald Junkins, Dennis Duan, Aniketh Iyengar, Jerry Liu, Ehsan Amid, Sebastian Thrun, Christopher Re
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Across 11 different VQ-VAE training paradigms, we find this restructuring improves reconstruction metrics, codebook utilization, and quantization error. ... In this section, we evaluate the effect of the rotation trick across many different VQ-VAE paradigms. ... Tables 1-5 display experimental results with various metrics such as r-FID, r-IS, and codebook usage. |
| Researcher Affiliation | Collaboration | 1Stanford University, 2Google Deep Mind |
| Pseudocode | Yes | Algorithm 1 The Rotation Trick Require: input example x e Encoder(x) q nearest codebook vector to e R rotation matrix that aligns e to q q stop-gradient h q e R i e x Decoder( q) loss L(x, x) return loss |
| Open Source Code | Yes | Our code is available at https://github.com/cfifty/rotation_trick. |
| Open Datasets | Yes | We begin with a straightforward evaluation: training a VQ-VAE to reconstruct examples from Image Net (Deng et al., 2009). ... VQGANs ... on Image Net and the combined dataset FFHQ (Karras et al., 2019) and Celeb A-HQ (Karras, 2017). ... video reconstructions from the BAIR Robot dataset (Ebert et al., 2017) and from the UCF101 action recognition dataset (Soomro, 2012). |
| Dataset Splits | Yes | We log both training and validation set reconstruction metrics. Of note, we compute reconstruction FID (Heusel et al., 2017) and reconstruction IS (Salimans et al., 2016) on reconstructions from the full Image Net validation set as a measure of reconstruction quality. |
| Hardware Specification | No | The paper mentions 'Due to GPU VRAM constraints' in Appendix A.10.4, but does not provide specific GPU models, CPU models, or other detailed hardware specifications used for running the experiments. |
| Software Dependencies | No | The paper references specific GitHub repositories for implementations (e.g., 'https://github.com/lucidrains/vector-quantize-pytorch', 'https://github.com/CompVis/taming-transformers') and Hugging Face, but does not list explicit version numbers for general software libraries like Python, PyTorch, CUDA, etc. |
| Experiment Setup | Yes | A complete description of both training settings is provided in Table 9 of the Appendix. ... Table 8 summarizes the hyperparameters used for the experiments in Section 5.1. ... Table 10: Hyperparameters for the experiments in Table 4. ... Table 11: Hyperparameters for the experiments in Table 5. |