Restructuring Vector Quantization with the Rotation Trick

Authors: Christopher Fifty, Ronald Junkins, Dennis Duan, Aniketh Iyengar, Jerry Liu, Ehsan Amid, Sebastian Thrun, Christopher Re

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Across 11 different VQ-VAE training paradigms, we find this restructuring improves reconstruction metrics, codebook utilization, and quantization error. ... In this section, we evaluate the effect of the rotation trick across many different VQ-VAE paradigms. ... Tables 1-5 display experimental results with various metrics such as r-FID, r-IS, and codebook usage.
Researcher Affiliation Collaboration 1Stanford University, 2Google Deep Mind
Pseudocode Yes Algorithm 1 The Rotation Trick Require: input example x e Encoder(x) q nearest codebook vector to e R rotation matrix that aligns e to q q stop-gradient h q e R i e x Decoder( q) loss L(x, x) return loss
Open Source Code Yes Our code is available at https://github.com/cfifty/rotation_trick.
Open Datasets Yes We begin with a straightforward evaluation: training a VQ-VAE to reconstruct examples from Image Net (Deng et al., 2009). ... VQGANs ... on Image Net and the combined dataset FFHQ (Karras et al., 2019) and Celeb A-HQ (Karras, 2017). ... video reconstructions from the BAIR Robot dataset (Ebert et al., 2017) and from the UCF101 action recognition dataset (Soomro, 2012).
Dataset Splits Yes We log both training and validation set reconstruction metrics. Of note, we compute reconstruction FID (Heusel et al., 2017) and reconstruction IS (Salimans et al., 2016) on reconstructions from the full Image Net validation set as a measure of reconstruction quality.
Hardware Specification No The paper mentions 'Due to GPU VRAM constraints' in Appendix A.10.4, but does not provide specific GPU models, CPU models, or other detailed hardware specifications used for running the experiments.
Software Dependencies No The paper references specific GitHub repositories for implementations (e.g., 'https://github.com/lucidrains/vector-quantize-pytorch', 'https://github.com/CompVis/taming-transformers') and Hugging Face, but does not list explicit version numbers for general software libraries like Python, PyTorch, CUDA, etc.
Experiment Setup Yes A complete description of both training settings is provided in Table 9 of the Appendix. ... Table 8 summarizes the hyperparameters used for the experiments in Section 5.1. ... Table 10: Hyperparameters for the experiments in Table 4. ... Table 11: Hyperparameters for the experiments in Table 5.