reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Practical Investigation of Spatially-Controlled Image Generation with Transformers

Authors: Guoxuan Xia, Harleen Hanspal, Petru-Daniel Tudosiu, Shifeng Zhang, Sarah Parisot

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform controlled experiments on Image Net across diffusion-based/ﬂow-based and autoregressive (AR) models. First, we establish control token preﬁlling as a simple, general and performant baseline approach for transformers. We then investigate previously underexplored sampling time enhancements, showing that extending classiﬁer-free guidance to control, as well as softmax truncation, have a strong impact on control-generation consistency.
Researcher Affiliation	Collaboration	Guoxuan Xia, Harleen Hanspal, Petru-Daniel Tudosiu, EMAIL Shifeng Zhang & Sarah Parisot EMAIL Work done at Huawei Noah s Ark Lab
Pseudocode	No	The paper describes methods using prose and mathematical equations (Eq. 1-14) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code: https://github.com/guoxoug/transformer-imagenet-ctrl.
Open Datasets	Yes	To this end, we perform controlled experiments on Image Net (Deng et al., 2009) over two representative but contrasting generative modelling approaches... and we train and evaluate on class-conditioned Image Net (Deng et al., 2009), a well-established benchmark for image generation.
Dataset Splits	Yes	For most evaluations we generate 10K samples for evaluation, conditioned on controls extracted from the ﬁrst 10 images of each of the 1000 classes in the Image Net validation dataset. In a few cases, to compare with the literature, we generate using controls from all 50K validation images. We use ﬁxed random seeds.
Hardware Specification	Yes	Inference is performed on a single NVIDIA Tesla V100-32GB-SXM2.
Software Dependencies	No	The paper mentions using 'kornia' for Canny edge map extraction but does not provide specific version numbers for it or any other key software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	We ﬁnetune for 10 epochs with control conditioning with batch size 256 ( 50K iterations) using the original optimisers and hyperparameters. Following the original papers we linearly increase guidance scale γy from zero over generation scales for VAR, whilst keeping it constant for Si T.