Categorical Schrödinger Bridge Matching
Authors: Grigoriy Ksenofontov, Alexander Korotin
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show the performance of CSBM via a series of experiments with synthetic data and VQ representations of images. ... We evaluate our CSBM algorithm across several setups. First, we analyze the convergence of D-IMF on discrete data (M4.1). Then, we demonstrate how CSBM performs with different reference processes in 2D experiments (M4.2). Next, we test CSBM s ability to translate images using the colored MNIST dataset (M4.3), varying the number of steps N. We then present an experiment on the Celeb A dataset (M4.4), showcasing CSBM s performance in a latent space. Finally, we explore the text domain by solving sentiment transfer on the Amazon Reviews dataset (Appendix C.4). |
| Researcher Affiliation | Academia | 1Skoltech, Moscow, Russia 2MIPT, Moscow, Russia 3AIRI, Moscow, Russia. Correspondence to: Grigoriy Ksenofontov <EMAIL>, Alexander Korotin <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Categorical SB matching (CSBM) |
| Open Source Code | Yes | The code of CSBM is available at this repository. |
| Open Datasets | Yes | We test CSBM s ability to translate images using the colored MNIST dataset (M4.3)...Here, we present an unpaired image-to-image translation experiment on the Celeb A dataset (M4.4)...This section examines the text domain, focusing on style transfer in the Amazon Reviews corpus (Ni et al., 2019). |
| Dataset Splits | Yes | For the Celeb A experiment (M4.4)...We train the model on 162 770 pre-quantized images of celebrities. For evaluation, we compute FID and CMMD using 11 816 hold-out images...For the Amazon experiment (Appendix C.4)...The model is trained on 104 000 pre-tokenized reviews and evaluated on 2 000 reviews from the held-out test set. |
| Hardware Specification | Yes | Training the 2D experiment requires several hours on a single A100 GPU. The colored MNIST experiment takes approximately two days to train using two A100 GPUs. The most computationally demanding task, the Celeb A and Amazon Reviews experiments, requires around five days of training on four A100 GPUs. |
| Software Dependencies | No | The paper mentions several software tools and models, such as the 'Adam W optimizer', 'Hugging Face pipeline', 'GPT-2 Large', 'unigram Sentence Piece model', and 'Di T model'. It also refers to official repositories like D3PM, VQ-GAN, VQ-Diffusion, and mdlm for implementation references. However, it does not explicitly provide specific version numbers for programming languages (e.g., Python), libraries (e.g., PyTorch, TensorFlow), or CUDA versions. |
| Experiment Setup | Yes | Table 5. Hyperparameters for experiments. Lr denotes the learning rate, and m represents millions. Params indicate the number of model parameters, where for the Celeb A dataset, the first value corresponds to the model and the second to the VQ-GAN. |