HyperVQ: MLR-based Vector Quantization in Hyperbolic Space

Authors: Nabarun Goswami, Yusuke Mukuta, Tatsuya Harada

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To assess the effectiveness of our proposed method, we conducted experiments across diverse tasks, encompassing image reconstruction, generative modeling, image classification, and feature disentanglement. The implementation details and results of each experiment are elaborated upon in the following subsections.
Researcher Affiliation Academia Nabarun Goswami EMAIL The University of Tokyo Yusuke Mukuta EMAIL The University of Tokyo, RIKEN Tatsuya Harada EMAIL The University of Tokyo, RIKEN
Pseudocode Yes Algorithm 1 Hyper VQ Training
Open Source Code No No explicit statement or link to the paper's source code is provided. The paper mentions using third-party libraries (geoopt) and referencing existing implementations (hyperbolic neural networks++), but not providing its own.
Open Datasets Yes To assess the generative modeling capabilities of the proposed Hyper VQ method, we conducted experiments on the Cifar100 (Krizhevsky, 2009) and Image Net (Russakovsky et al., 2015) datasets. ... we trained a Hyper VQVAE model on the simple MNIST dataset. ... incorporating corruptions from the Image Net-C dataset (Hendrycks & Dietterich, 2019) for data augmentation. ... conducted pre-training on the Libri Speech 960h dataset.
Dataset Splits Yes To assess the generative modeling capabilities of the proposed Hyper VQ method, we conducted experiments on the Cifar100 (Krizhevsky, 2009) and Image Net (Russakovsky et al., 2015) datasets. ... Table 1: Comparison of reconstruction mean squared error (MSE) for Kmeans VQ and Hyper VQ on Cifar100 (test) and Image Net (validation) datasets for different codebook sizes, K. ... Figure 3a displays some reconstructions achieved with Hyper VQ on the Image Net validation set (128 128) ... Classification accuracy is reported on the clean validation set of Image Net, along with accuracy on known and unknown corruptions in the Image Net-C dataset. ... conducted pre-training on the Libri Speech 960h dataset.
Hardware Specification Yes Model training was conducted on 4 A100 GPUs utilizing the Adam optimizer (Kingma & Ba, 2014), with a learning rate set at 3e-4 and a batch size of 128 per GPU, unless otherwise specified.
Software Dependencies No The paper mentions using the Py Torch library and the geoopt library but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes Model training was conducted on 4 A100 GPUs utilizing the Adam optimizer (Kingma & Ba, 2014), with a learning rate set at 3e-4 and a batch size of 128 per GPU, unless otherwise specified. For this experiment, in addition to Kmeans VQ and Hyper VQ, we also included Gumbel VQ for comparison. All models underwent training for 500 epochs, utilizing the same settings as described in Section 5.1.