reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

FlexDataset: Crafting Annotated Dataset Generation for Diverse Applications

Authors: Ellen Yi-Ge, Leo Shawn

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that Flex Dataset sets a new standard in synthetic dataset generation across multiple datasets and tasks, including zero-shot and long-tail scenarios.
Researcher Affiliation	Academia	1Carnegie Mellon University 2University of the Chinese Academy of Sciences EMAIL, EMAIL
Pseudocode	No	The paper includes a figure (Figure 3) illustrating the model architecture and mathematical equations, but no explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Code https://github.com/Ellen Yi Ge/Flex Dataset
Open Datasets	Yes	For training C2I model and perception decoders, following the methodology of Layout Diffusion (Zheng et al. 2023), we employ the COCO 2017 Stuff Segmentation Challenge subset. Each image contains bounding boxes and pixel-level segmentation masks for 80 categories of things and 91 categories of stuff. From these, we select images that feature between 3 to 8 objects, each covering more than 2% of the image area and not belonging to a crowd. ... We evaluate Flex Dataset on the PPM-100 benchmark (Ke et al. 2022)... NYU Depth V2... VOC 2012.
Dataset Splits	Yes	Flex Dataset uses 80k synthetic images based on 400 real images. ... For Zero Shot, consistent with priors (Li et al. 2023c; Wu et al. 2023b,a), Flex Dataset is trained using only 15 seen categories and evaluated across all 20 categories. In the Long-tail configuration, the 20 categories are divided into head classes (10 classes, 20 images per class) and tail classes (10 classes, 2 images per class).
Hardware Specification	Yes	For all tasks, we train Flex Dataset for approximately 200 iteration using images of size 512 512 on a single Tesla V100 GPU.
Software Dependencies	No	The paper mentions using an optimizer from (Loshchilov and Hutter 2017) and referring to other models like VGG-16, Depth-Anything, and SEEM, but does not provide specific version numbers for its own software dependencies like programming languages or libraries.
Experiment Setup	Yes	For all tasks, we train Flex Dataset for approximately 200 iteration using images of size 512 512 on a single Tesla V100 GPU. We use the optimizer from (Loshchilov and Hutter 2017) with a learning rate of 0.0002.