Categorical Schrödinger Bridge Matching

Authors: Grigoriy Ksenofontov, Alexander Korotin

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show the performance of CSBM via a series of experiments with synthetic data and VQ representations of images. ... We evaluate our CSBM algorithm across several setups. First, we analyze the convergence of D-IMF on discrete data (M4.1). Then, we demonstrate how CSBM performs with different reference processes in 2D experiments (M4.2). Next, we test CSBM s ability to translate images using the colored MNIST dataset (M4.3), varying the number of steps N. We then present an experiment on the Celeb A dataset (M4.4), showcasing CSBM s performance in a latent space. Finally, we explore the text domain by solving sentiment transfer on the Amazon Reviews dataset (Appendix C.4).
Researcher Affiliation Academia 1Skoltech, Moscow, Russia 2MIPT, Moscow, Russia 3AIRI, Moscow, Russia. Correspondence to: Grigoriy Ksenofontov <EMAIL>, Alexander Korotin <EMAIL>.
Pseudocode Yes Algorithm 1 Categorical SB matching (CSBM)
Open Source Code Yes The code of CSBM is available at this repository.
Open Datasets Yes We test CSBM s ability to translate images using the colored MNIST dataset (M4.3)...Here, we present an unpaired image-to-image translation experiment on the Celeb A dataset (M4.4)...This section examines the text domain, focusing on style transfer in the Amazon Reviews corpus (Ni et al., 2019).
Dataset Splits Yes For the Celeb A experiment (M4.4)...We train the model on 162 770 pre-quantized images of celebrities. For evaluation, we compute FID and CMMD using 11 816 hold-out images...For the Amazon experiment (Appendix C.4)...The model is trained on 104 000 pre-tokenized reviews and evaluated on 2 000 reviews from the held-out test set.
Hardware Specification Yes Training the 2D experiment requires several hours on a single A100 GPU. The colored MNIST experiment takes approximately two days to train using two A100 GPUs. The most computationally demanding task, the Celeb A and Amazon Reviews experiments, requires around five days of training on four A100 GPUs.
Software Dependencies No The paper mentions several software tools and models, such as the 'Adam W optimizer', 'Hugging Face pipeline', 'GPT-2 Large', 'unigram Sentence Piece model', and 'Di T model'. It also refers to official repositories like D3PM, VQ-GAN, VQ-Diffusion, and mdlm for implementation references. However, it does not explicitly provide specific version numbers for programming languages (e.g., Python), libraries (e.g., PyTorch, TensorFlow), or CUDA versions.
Experiment Setup Yes Table 5. Hyperparameters for experiments. Lr denotes the learning rate, and m represents millions. Params indicate the number of model parameters, where for the Celeb A dataset, the first value corresponds to the model and the second to the VQ-GAN.