Diversified Flow Matching with Translation Identifiability
Authors: Sagar Shrestha, Xiao Fu
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on synthetic and real-world datasets validate the proposed method. We test our method over synthetic data and real-world applications (i.e., robot crowd route planning and unpaired image translation). The results corroborate with our theoretical analyses and algorithm design. Section 5. Experiments: Interpolant Construction. Baselines. 5.1. Synthetic Data Validation. 5.2. Image Translation. 5.3. Swarm Navigation. |
| Researcher Affiliation | Academia | 1School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, USA. Correspondence to: Xiao Fu <EMAIL>. |
| Pseudocode | Yes | The proposed algorithm is referred to as DFM and is detailed in Algorithm 1 in Appendix B. Appendix B.2. Algorithm DFM. Algorithm 1 DFM. Appendix D.1. Algorithm 2 DFM with Interleaved Training. |
| Open Source Code | No | The paper does not contain an explicit statement about the release of source code, nor does it provide a link to a code repository. The "Impact Statement" section does not address code availability. |
| Open Datasets | Yes | We use images of human faces from the Celeb AHQ dataset (Karras et al., 2017) with 30, 000 images as the source data px and Bitmoji faces with 4084 images (Mozafari, 2020) as the target data py. Bitmoji faces (Mozafari, 2020). https://www. kaggle.com/datasets/mostafamozafari/bitmoji-faces, 2020. Accessed on January 20th, 2025. The surface is specified by Li DAR measurements of Mt. Rainier (Legg & Anderson, 2013) containing 34,183 points. Legg, N. and Anderson, S. Southwest flank of mt. rainier, wa, 2013. https://opentopography. org/meta/OT.052013.26910.1, 2013. Accessed on January 28, 2025. |
| Dataset Splits | No | The paper mentions using "randomly selected 5000 images from Celeb AHQ" and "K = 4000 samples for each of the conditional distributions" for swarm navigation. However, it does not provide specific train/validation/test dataset splits, percentages, or methodology for partitioning the data for any of the experiments. |
| Hardware Specification | No | The paper does not explicitly state any specific hardware details such as GPU models (e.g., NVIDIA A100, RTX 2080 Ti), CPU models, or other processor information used for running the experiments. It only mentions that "All FM-based methods are trained on the latent space of the VAE from Stable Diffusion v1" but does not specify the hardware on which this training was conducted. |
| Software Dependencies | Yes | All FM-based methods are trained on the latent space of the VAE from Stable Diffusion v1 (Rombach et al., 2022). We use the UNet architecture (Ronneberger et al., 2015). We adopt a similar hyperparameter configuration based on the UNet architecture (Dhariwal & Nichol, 2021). For the vector field, we use the Adam W optimizer (Loshchilov, 2017). We take the exponential moving average (EMA) (Tarvainen & Valpola, 2017). |
| Experiment Setup | Yes | 5.1. Synthetic Data Validation: We use two layer MLP with 64 hidden units and Se LU activations to represent vt( ; ϕ) as well as Iθ. We use an Adam optimizer with an initial learning rate of 0.001 for vt and 0.0001 for Iθ. We use a batch size of 512. We run both phases of Algorithm 1 for 2000 iterations. Appendix D.1. Unpaired Image to Image Translation: For the vector field, we use the Adam W optimizer... with an initial learning rate of 10 4 and parameters β1 = 0.9, β2 = 0.999, ϵ = 1e 8, and no weight decay. We use a batch size of 64, and dropout of 0.1. We use the same hyperparameter settings for the interpolant, except that the learning rate is set to 10 8, head channels is 32 and attention resolution is 8. We use σ1 = 0.1 and σ2 = 10. We train the models for 100k iterations. C.1. Swarm Navigation: We use a 3-layer MLP with 64 hidden units and Se LU activation to represent both Iθ and vt( ; ϕ). We use Adam optimizer for both interpolant and the vector field with an initial learning rate of 10 4 and 10 3 respectively. We use a weight decay of 10 5 for both networks. We use σ1 = 0.1, σ2 = 1.5. |