reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Teaching Diffusion Models to Ground Alpha Matte

Authors: Tianyi Xiang, Weiying Zheng, Yutao Jiang, Tingrui Shen, Hewei Yu, Yangyang Xu, Shengfeng He

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments highlight our model s adaptability, precision, and computational efficiency, setting a new benchmark for flexible, text-driven image matting solutions. The code is available at https://github.com/xty435768/Teach Diffusion Matting. 4 Experiment 4.1 Implementation Details 4.2 Evaluate Metrics 4.3 Comparison on Soft Grounding 4.4 Comparison on Generalization Ability 4.5 Ablation Studies Quantitative Results. We show the quantitative comparison on soft grounding in Tab. 1.
Researcher Affiliation	Academia	Tianyi Xiang EMAIL Department of Computer Science City University of Hong Kong Weiying Zheng EMAIL School of Computing and Data Science The University of Hong Kong Yutao Jiang EMAIL School of Computer Science and Engineering South China University of Technology Tingrui Shen EMAIL School of Computer Science and Engineering South China University of Technology Hewei Yu EMAIL School of Computer Science and Engineering South China University of Technology Yangyang Xu EMAIL School of Intelligence Science and Engineering Harbin Institute of Technology (Shenzhen) Shengfeng He EMAIL School of Computing and Information Systems Singapore Management University
Pseudocode	No	The paper describes its methodology in Section 3, titled "Method". This section includes mathematical formulations and textual descriptions of the model's pipeline, objectives, and structural optimizations, as well as an overview diagram (Fig. 3). However, it does not contain any explicitly labeled pseudocode blocks or algorithm listings.
Open Source Code	Yes	Extensive experiments highlight our model s adaptability, precision, and computational efficiency, setting a new benchmark for flexible, text-driven image matting solutions. The code is available at https://github.com/xty435768/Teach Diffusion Matting.
Open Datasets	Yes	The data used to train our model comprises 4 matting datasets (Ref Matte (Li et al., 2023a), P3M10K (Li et al., 2021a), AM2K (Li et al., 2022), RM1K (Wang et al., 2023b)), and 1 grounding segmentation dataset (Ref COCO (Kazemzadeh et al., 2014)).
Dataset Splits	Yes	We apply two referring natural matting benchmarks, including Ref Matte-Test (Li et al., 2023a) and Ref Matte-RW100 (Li et al., 2023a), for soft grounding evaluation. Here, the former is a composition dataset (6,243 instances among 2,500 images) and the latter is a real-world dataset (221 instances among 100 images). Every instance in these two benchmarks has 4 different expressions, so we evaluate all baselines and ours using all expressions and report the average result among these 4 expressions. During evaluation, the input resolution for all methods is set to 512 512, and the metrics are also calculated on this resolution. All stages of our model s training process adopt a consistent data scheduling strategy. Specifically, we train the model on Ref Matte during odd-numbered iterations and on Ref COCO in every even iteration. We also insert a special iteration after every 4 iterations to perform training on P3M10K, AM2K, and RM1K.
Hardware Specification	Yes	We also report the average inference time per sample in milliseconds, using the same machine with a single RTX 3090. All the training work is done on NVIDIA A100 80GB GPU(s).
Software Dependencies	No	The paper mentions several tools and models used, such as BLIP2 (Li et al., 2023b), AdamW (Loshchilov & Hutter, 2019), and Spconv (Contributors, 2022). However, it does not specify explicit version numbers for general software dependencies like Python, PyTorch, or CUDA, which are crucial for full reproducibility.
Experiment Setup	Yes	We set the kernel size of the morphological operation to 15, and we set (λSTM, λCTM, λSG, λαlr, λR) to (10, 0.1, 0.5, 10, 1). The timestep input of the SD model is set to 1.0 during both training and inference, which is consistent with previous works (Zhao et al., 2023a; Lee et al., 2024; Xu et al., 2024a). Other training settings, including batch size, learning rate, total iterations, and rationales behind setting λs, can be found in the Appendix. Table 4: Hyperparameters for all training stages.