reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Semantic-Space-Intervened Diffusive Alignment for Visual Classification

Authors: Zixuan Li, Lei Meng, Guoqing Chao, Wei Wu, Yimeng Yang, Xiaoshuo Yan, Zhuang Qi, Xiangxu Meng

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that Se DA achieves stronger cross-modal feature alignment, leading to superior performance over existing methods across multiple scenarios. Extensive experiments are conducted on the general dataset NUS-WIDE, the domain-specific dataset VIREO Food-172, and the video dataset MSRVTT, including performance comparisons, ablation studies, in-depth analysis, and case studies.
Researcher Affiliation	Academia	Zixuan Li1 , Lei Meng1 , Guoqing Chao2 , Wei Wu1 , Yimeng Yang1 , Xiaoshuo Yan1 , Zhuang Qi1 , Xiangxu Meng1 1School of Software, Shandong University, Jinan, China 2School of Computer Science and Technology, Harbin Institute of Technology, Weihai, China EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the proposed method (Se DA) in detail, including its modules (PFIN, DSL, DST) and their mathematical formulations and optimization processes. However, it does not present any explicitly labeled 'Pseudocode' or 'Algorithm' block with structured steps.
Open Source Code	No	The paper does not contain an unambiguous statement from the authors about releasing their code for the described methodology, nor does it provide a direct link to a source-code repository.
Open Datasets	Yes	VIREO Food-172[Chen and Ngo, 2016]: A singlelabel dataset with 110,241 food images in 172 categories and an average of three text descriptions per image. NUS-WIDE[Chua et al., 2009]: A multi-label dataset of 203,598 images (after filtering) in 81 categories, with textual tags from a 1000-word vocabulary. MSRVTT[Xu et al., 2016]: A video dataset with 10,000 You Tube clips and 200,000 captions.
Dataset Splits	Yes	VIREO Food-172: It includes 66,071 training and 33,154 test images. NUS-WIDE: It has 121,962 training and 81,636 test images. MSRVTT: We used 7,010 videos for training and 2,990 for testing.
Hardware Specification	Yes	Our experiments were conducted on a single NVIDIA Tesla V100 GPU
Software Dependencies	Yes	Our experiments were conducted on a single NVIDIA Tesla V100 GPU, using Py Torch 1.10.2, and the batch size is 64.
Experiment Setup	Yes	In this experiment, we chose Adam as the optimizer for the model, with a weight decay of 1e-4. The learning rate for all neural networks was set between 1e-4 and 5e-5. The learning rate decayed to half of its original value every four training epochs. For the loss weights mentioned in the training strategy, we selected α1 and α2 between 0.1 and 2.0, the time step T between 900 and 1500, the staged step t between 0 and 500, while β and γ were chosen from [0.5, 1.0, 1.5, 2.0]. Our experiments were conducted on a single NVIDIA Tesla V100 GPU, using Py Torch 1.10.2, and the batch size is 64.