reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Color Transfer with Modulated Flows

Authors: Maria Larchenko, Alexander Lobashev, Dmitry Guskov, Vladimir Vladimirovich Palyulin

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We train an encoder on this dataset to predict the weights of a rectified model for new images. After training on a set of optimal transport plans, our approach can generate plans for new pairs of distributions without additional fine-tuning. We additionally show that the trained encoder provides an image embedding, associated only with its color style. The presented method is capable of processing 4K images and achieves the state-of-the-art performance in terms of content and style similarity. Experiments and Metrics Dataset. To implement the approach described above one needs a dataset of images with sufficiently diverse color distributions and resolutions. To achieve this diversity we construct our dataset by combining DIV2K (Ignatov, Timofte et al. 2019) and CLIC2020 (Toderici et al. 2020) (designed for image compression challenges) with a subset of laionart-en-colorcanny (Ghoskno 2023). The total number of images is 5,826. For every image we train a small two-layer MLP with 1024 hidden units (8195 parameters in total) and tanh activation, storing in the dataset 5,826 rectified models. Generation of a model-image pair takes approximately 100k iterations with lr = 5e-4. Encoder. Efficient Net B6 is used as an encoder model (Tan and Le 2019). For simplicity we set the output dimension to 8195 for it to be the same with the dataset of trained flows. The encoder was trained with Adam optimiser (Kingma and Ba 2014) for 751k iterations with the batch size equals to 8 images. We decreased the learning rate from lr = 5e-4 to lr = 1e-4 after the first 100k iterations. Test set. Tests were conducted on 1891 content-style pairs selected from Unsplash Lite 1.2.2 (Unsplash 2023).
Researcher Affiliation	Academia	Maria Larchenko, Alexander Lobashev, Dmitry Guskov, Vladimir Vladimirovich Palyulin Skolkovo Institute of Science and Technology, Moscow 121205, Russia
Pseudocode	Yes	Algorithm 1: Encoder training Require: trained image-flow pairs (I, θ) 1: repeat 2: get batch I = {I}N i , θ = {θ}N i 3: for i = 1, . . . N do 4: sample X I 5: Z = Tθ(X) 6: collect t Uniform [0, 1] 7: collect Zt = t Z + (1 t)X 8: collect vt = vθ(Zt, t) 9: end for 10: Randomly reflect and rotate I I 11: e = Enc(I) 12: t = {t}N i , Zt = {Zt}N i , vt = {vt}N i 13: Apply e as parameters for Mod Flow to get ve(Zt, t) 14: Take gradient step with respect to Enc weights on E vt ve(Zt, t) 2 15: until converged
Open Source Code	Yes	Code https://github.com/maria-larchenko/modflows
Open Datasets	Yes	To achieve this diversity we construct our dataset by combining DIV2K (Ignatov, Timofte et al. 2019) and CLIC2020 (Toderici et al. 2020) (designed for image compression challenges) with a subset of laionart-en-colorcanny (Ghoskno 2023). The total number of images is 5,826. [...] Test set. Tests were conducted on 1891 content-style pairs selected from Unsplash Lite 1.2.2 (Unsplash 2023).
Dataset Splits	No	Test set. Tests were conducted on 1891 content-style pairs selected from Unsplash Lite 1.2.2 (Unsplash 2023). Searches were run on 25,000 Unsplash pictures. Our pictures are generated in 8 steps of ODE solver (16 steps in total for forward and inverse passes). The paper describes the dataset creation and a specific test set from Unsplash Lite, but does not explicitly provide details about the training, validation, and test splits for the encoder training on the combined dataset of 5,826 images.
Hardware Specification	No	No specific hardware details (GPU models, CPU models, memory, or compute resources) used for running the experiments are provided in the paper.
Software Dependencies	No	The paper mentions various tools, libraries, and models used, such as 'Adam optimiser (Kingma and Ba 2014)', 'Efficient Net B6 (Tan and Le 2019)', and 'DISTS implementation is taken from piq library (Kastryulin, Zakirov, and Prokopenko 2019)', but does not provide specific version numbers for these or other core software dependencies like Python or PyTorch versions used in their implementation.
Experiment Setup	Yes	The encoder was trained with Adam optimiser (Kingma and Ba 2014) for 751k iterations with the batch size equals to 8 images. We decreased the learning rate from lr = 5e-4 to lr = 1e-4 after the first 100k iterations.