reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Object-Centric Relational Representations for Image Generation

Authors: Luca Butera, Andrea Cini, Alberto Ferrante, Cesare Alippi

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results show that the proposed approach compares favorably against relevant baselines. We carried out our experiments with a GAN, even though the same principles can be extended to work with other methodologies. [...] Table 1 shows the Frechet Inception Distance (FID) (Heusel et al., 2017) and Inception Score (IS) (Salimans et al., 2016) for our model and the baselines w.r.t. both datasets, together with the Structural Similarity Index Measure (SSIM) (Wang et al., 2004) for the PRO dataset.
Researcher Affiliation	Academia	Luca Butera EMAIL The Swiss AI Lab IDSIA & Università della Svizzera italiana Andrea Cini EMAIL The Swiss AI Lab IDSIA & Università della Svizzera italiana Alberto Ferrante EMAIL The Swiss AI Lab IDSIA & Università della Svizzera italiana Cesare Alippi EMAIL The Swiss AI Lab IDSIA & Università della Svizzera italiana Politecnico di Milano
Pseudocode	No	The paper describes its methodology using mathematical equations (e.g., Equation (2), (3), (5)) and architectural descriptions (e.g., in Tables 3-11), but it does not include any clearly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	Yes	The code to reproduce the experiments and generate the PRO dataset is available online1. It is all written in Python 3.9, leveraging Pytorch (Paszke et al., 2019) and Pytorch Lightning (Falcon & team, 2019) to define models and data, while Hydra (Yadan, 2019) is used to manage the experiment configuration. Weights & Biases (Biewald, 2020) is used to log and compare experiment results. 1https://github.com/Luca Butera/graphose_ocrrig
Open Datasets	Yes	To the best of our knowledge, this is the first work to use object-centric graph-based relational representations to condition a neural generative model. Furthermore, we complement the methodological contributions by introducing a benchmark for pose-conditioned image generation: we propose a synthetic dataset named Pose-Representable Objects (PRO), which aims at assessing the performance of generative models in conditioning image generations on fine-grained structural and semantic information of the objects in the scene. [...] For the Deep Fashion dataset, we use just the Fashion Landmark Detection and the In-Shop Retrieval subsets. The keypoint annotations for Deep Fashion In-Shop Retrieval and Market 1501 are based on Zhu et al. (2019), and were generated by using the open-source software Open Pose (Cao et al., 2019). For Deep Fashion Fashion Landmark we instead rely on Media Pipe s Blaze Pose (Bazarevsky et al., 2020), while MPII Human Pose already contains pose features.
Dataset Splits	Yes	Regarding dataset size, in our experiments we generated 100000 images. We employed 1000 of these as validation set, and another 1000 as test set. Images were sampled so that the dataset contains a uniform distribution of different objects. Moreover, also the number of objects that appear in an image at once, between 1 and 4, is uniformly distributed across the dataset. Train, validation and test splits, while random, preserved these uniform distributions. [...] The Humans dataset consists of roughly 300000 images, coming from MPII Human Pose (Andriluka et al., 2014), Deep Fashion (Liu et al., 2016) and Market 1501 (Zheng et al., 2015) [...] Out of these, around 3000 images, were used for validation and an analogous size for testing. Train, validation, and test sets were partitioned at random but the proportion of samples from the 3 original benchmarks was maintained.
Hardware Specification	Yes	We run all the experiments on an NVIDIA RTX A5000 GPU equipped with 24GBs of VRAM.
Software Dependencies	Yes	It is all written in Python 3.9, leveraging Pytorch (Paszke et al., 2019) and Pytorch Lightning (Falcon & team, 2019) to define models and data, while Hydra (Yadan, 2019) is used to manage the experiment configuration. Weights & Biases (Biewald, 2020) is used to log and compare experiment results.
Experiment Setup	Yes	Each model is trained under the same settings and metrics are computed over three different runs each. In particular, we use the Adam (Kingma & Ba, 2015) optimizer, with cosine annealing (Loshchilov & Hutter, 2017) learning rate schedule with period 300 for the PRO dataset and 150 for the Humans one, starting learning rate 0.002 and final learning rate 0.00002. Models are trained up to 300 (PRO) or 150 (Humans) epochs each, with a batch size of 64 and early stopping with patience 50 on the FID.