reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Semantix: An Energy-guided Sampler for Semantic Style Transfer

Authors: Huiang He, Minghui HU, Chuanxia Zheng, Chaoyue Wang, Tat-Jen Cham

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that Semantix not only effectively accomplishes the task of semantic style transfer across images and videos, but also surpasses existing state-of-the-art solutions in both fields. ... In this section, we conduct an exhaustive experimental analysis to substantiate the efficacy and superiority of our proposed method through qualitative comparison (Sec. 5.1), quantitative comparison (Sec. 5.2) and ablation study (Sec. 5.3).
Researcher Affiliation	Collaboration	Huiang He South China University of Technology EMAIL; Minghui Hu Spell Brush & Nanyang Technological University EMAIL; Chuanxia Zheng VGG, University of Oxford EMAIL; Chaoyue Wang The University of Sydney EMAIL; Tat-Jen Cham College of Computing and Data Science Nanyang Technological University EMAIL
Pseudocode	Yes	Algorithm 1: Proposed Semantix
Open Source Code	No	The paper does not contain any explicit statements about releasing source code or provide links to a code repository.
Open Datasets	Yes	We select the COCO (Lin et al., 2014) dataset as the source of context images and obtain style images from Wiki Art (Tan et al., 2018) and appearance images from Cross-Image (Alaluf et al., 2023).
Dataset Splits	No	The paper describes its method as "training-free" and evaluates on "1000 sampled context-style image pairs" and "100 stylized videos." It specifies the number of samples used for evaluation but does not provide traditional training/test/validation dataset splits for model training, as its method does not involve a training phase.
Hardware Specification	Yes	We use NVIDIA A100 (80G) GPUs for all experiments.
Software Dependencies	No	The paper mentions building upon "pre-trained Stable Diffusion v1.5 model" and using "Animate Diff (Guo et al., 2023)" as a base model. However, it does not provide specific version numbers for programming languages, libraries, or operating systems used for implementation.
Experiment Setup	Yes	We invert the input images or videos into noises through DDPM inversion across 60 timesteps. For classifier-free guidance, we set the scale factor ω = 3.5, aligning it with the sampling procedures. During the sampling process, the features for guidance are extracted from the second and third blocks of the UNet s decoder. In image style transfer tasks, we adjust the weights of style feature guidance, spatial feature guidance and semantic distance regularisation to γref = 3.0, γc = 0.9, γreg = 1.0, respectively. Additionally, we incorporate a 2D position encoding into the features and assign it a weight of λpe = 3.0. For video task, the corresponding hyper-parameters are set to γref = 6.0, γc = 3.0, γreg = 5.0, λpe = 3.0. We further employ a hard clamp in the range of [ 1, 1] for all guidance. After 20 denoising timesteps, we apply Ada IN (Huang and Belongie, 2017) for the style latents xref t and output latents xout t .