Color Transfer with Modulated Flows
Authors: Maria Larchenko, Alexander Lobashev, Dmitry Guskov, Vladimir Vladimirovich Palyulin
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We train an encoder on this dataset to predict the weights of a rectified model for new images. After training on a set of optimal transport plans, our approach can generate plans for new pairs of distributions without additional fine-tuning. We additionally show that the trained encoder provides an image embedding, associated only with its color style. The presented method is capable of processing 4K images and achieves the state-of-the-art performance in terms of content and style similarity. Experiments and Metrics Dataset. To implement the approach described above one needs a dataset of images with sufficiently diverse color distributions and resolutions. To achieve this diversity we construct our dataset by combining DIV2K (Ignatov, Timofte et al. 2019) and CLIC2020 (Toderici et al. 2020) (designed for image compression challenges) with a subset of laionart-en-colorcanny (Ghoskno 2023). The total number of images is 5,826. For every image we train a small two-layer MLP with 1024 hidden units (8195 parameters in total) and tanh activation, storing in the dataset 5,826 rectified models. Generation of a model-image pair takes approximately 100k iterations with lr = 5e-4. Encoder. Efficient Net B6 is used as an encoder model (Tan and Le 2019). For simplicity we set the output dimension to 8195 for it to be the same with the dataset of trained flows. The encoder was trained with Adam optimiser (Kingma and Ba 2014) for 751k iterations with the batch size equals to 8 images. We decreased the learning rate from lr = 5e-4 to lr = 1e-4 after the first 100k iterations. Test set. Tests were conducted on 1891 content-style pairs selected from Unsplash Lite 1.2.2 (Unsplash 2023). |
| Researcher Affiliation | Academia | Maria Larchenko, Alexander Lobashev, Dmitry Guskov, Vladimir Vladimirovich Palyulin Skolkovo Institute of Science and Technology, Moscow 121205, Russia |
| Pseudocode | Yes | Algorithm 1: Encoder training Require: trained image-flow pairs (I, θ) 1: repeat 2: get batch I = {I}N i , θ = {θ}N i 3: for i = 1, . . . N do 4: sample X I 5: Z = Tθ(X) 6: collect t Uniform [0, 1] 7: collect Zt = t Z + (1 t)X 8: collect vt = vθ(Zt, t) 9: end for 10: Randomly reflect and rotate I I 11: e = Enc(I) 12: t = {t}N i , Zt = {Zt}N i , vt = {vt}N i 13: Apply e as parameters for Mod Flow to get ve(Zt, t) 14: Take gradient step with respect to Enc weights on E vt ve(Zt, t) 2 15: until converged |
| Open Source Code | Yes | Code https://github.com/maria-larchenko/modflows |
| Open Datasets | Yes | To achieve this diversity we construct our dataset by combining DIV2K (Ignatov, Timofte et al. 2019) and CLIC2020 (Toderici et al. 2020) (designed for image compression challenges) with a subset of laionart-en-colorcanny (Ghoskno 2023). The total number of images is 5,826. [...] Test set. Tests were conducted on 1891 content-style pairs selected from Unsplash Lite 1.2.2 (Unsplash 2023). |
| Dataset Splits | No | Test set. Tests were conducted on 1891 content-style pairs selected from Unsplash Lite 1.2.2 (Unsplash 2023). Searches were run on 25,000 Unsplash pictures. Our pictures are generated in 8 steps of ODE solver (16 steps in total for forward and inverse passes). The paper describes the dataset creation and a specific test set from Unsplash Lite, but does not explicitly provide details about the training, validation, and test splits for the encoder training on the combined dataset of 5,826 images. |
| Hardware Specification | No | No specific hardware details (GPU models, CPU models, memory, or compute resources) used for running the experiments are provided in the paper. |
| Software Dependencies | No | The paper mentions various tools, libraries, and models used, such as 'Adam optimiser (Kingma and Ba 2014)', 'Efficient Net B6 (Tan and Le 2019)', and 'DISTS implementation is taken from piq library (Kastryulin, Zakirov, and Prokopenko 2019)', but does not provide specific version numbers for these or other core software dependencies like Python or PyTorch versions used in their implementation. |
| Experiment Setup | Yes | The encoder was trained with Adam optimiser (Kingma and Ba 2014) for 751k iterations with the batch size equals to 8 images. We decreased the learning rate from lr = 5e-4 to lr = 1e-4 after the first 100k iterations. |