reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DiMSOD: A Diffusion-Based Framework for Multi-Modal Salient Object Detection

Authors: Shuo Zhang, Jiaming Huang, Wenbing Tang, Yan Wu, Terrence Hu, Xiaogang Xu, Jing Liu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that Di MSOD efficiently detects salient objects across RGB, RGB-D, and RGB-T datasets, achieving superior performance compared to previous well-established methods.
Researcher Affiliation	Collaboration	1Shanghai Key Laboratory of Trustworthy Computing, East China Normal University 2Technology Center, Huolala 3College of Computing and Data Science, Nanyang Technological University 4The Chinese University of Hong Kong shuo zhang EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the network architecture and methods in detail but does not contain a specific pseudocode or algorithm block.
Open Source Code	No	The code for evaluating the model is derived from F3Net. (This refers to a third-party evaluation code, not the source code for the proposed Di MSOD model itself. There is no explicit statement about releasing their own code.)
Open Datasets	Yes	Di MSOD is trained jointly using three different types of SOD datasets, following recent work, our training dataset consists of the following subsets and resize it to 512 512 : the RGB dataset DUTS-TR (Wang et al. 2017) with 10,553 images, the RGB-T dataset VT5000 (Tu et al. 2022b) with 2,500 images, the RGB-D dataset NJUD (Ju et al. 2014) with 1,485 image, NLPR (Peng et al. 2014) with 700 images, and DUTLF-Depth (Piao et al. 2019) with 800 images. Stable Diffusion is used as our backbone when implementing Di MSOD.
Dataset Splits	Yes	Di MSOD is trained jointly using three different types of SOD datasets, following recent work, our training dataset consists of the following subsets and resize it to 512 512 : the RGB dataset DUTS-TR (Wang et al. 2017) with 10,553 images, the RGB-T dataset VT5000 (Tu et al. 2022b) with 2,500 images, the RGB-D dataset NJUD (Ju et al. 2014) with 1,485 image, NLPR (Peng et al. 2014) with 700 images, and DUTLF-Depth (Piao et al. 2019) with 800 images. For RGB datasets, we evaluate Di MSOD on 5 widely used benchmark datasets that are not seen during training, including DUT-OMRON (5,168 images), ECSSD (1,000 images), PASCAL-S (850 images), HKU-IS (4,447 images), and DUTS-TE (5,019 Images). For RGB-D datasets , we use the test sets of DUTLF-Depth (400 images), NJUD (500 images), NLPR (300 images), SIP (929 images), LFSD (100 images). For RGB-T datasets , we use the testset of VT5000 (2,500 images) ,VT821 (821 images) , VT1000 (1,000 images).
Hardware Specification	Yes	Training our method takes 100 epochs with a batch size of 32 on 4 Nvidia A100 GPU cards.
Software Dependencies	No	Stable Diffusion is used as our backbone when implementing Di MSOD. The initial pre-training configurations with a v-objective (Salimans and Ho 2022) are adhered to our experiments. In training, we implement the DDPM noise scheduler (Ho, Jain, and Abbeel 2020b) with 1,000 diffusion steps. For inference, we employ DDIM scheduler and sample 20 steps. (No specific version numbers for libraries like PyTorch, CUDA, or specific Stable Diffusion version are provided.)
Experiment Setup	Yes	In training, we implement the DDPM noise scheduler (Ho, Jain, and Abbeel 2020b) with 1,000 diffusion steps. For inference, we employ DDIM scheduler and sample 20 steps. For the final prediction, we combine outcomes from 10 inference iterations initiated with diverse initial noise. Training our method takes 100 epochs with a batch size of 32 on 4 Nvidia A100 GPU cards. We adopt the Adam optimizer with a learning rate of 3 10 5. We also implement training data augmentation strategies through the application of random horizontal and vertical flips.