reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Revisiting Change Captioning from Self-supervised Global-Part Alignment

Authors: Feixiao Lv, Rui Wang, Lihua Jing

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show our method achieves the stateof-the-art results on four datasets.
Researcher Affiliation	Academia	1Institute of Information Engineering, CAS, Beijing, China 2School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper includes mathematical formulations and flowcharts (Figure 2 and 3), but does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	Datasets Birds-to-Words dataset (Forbes et al. 2019) consists of 41k sentences that describe ﬁne-grained changes... CLEVR-Change dataset (Park, Darrell, and Rohrbach 2019) is a large-scale synthetic dataset... Spot-the-Diff dataset (Jhamtani and Berg-Kirkpatrick 2018) includes 13,192 aligned image pairs... Image Editing Request dataset (Tan et al. 2019) includes 3,939 aligned image pairs...
Dataset Splits	Yes	Birds-to-Words dataset (Forbes et al. 2019) consists of 41k sentences... This leads to 12,890/1,556/1,604 captions for train/val/test splits.
Hardware Specification	Yes	Both training and inference are implemented with Py Torch (Paszke et al. 2019) on RTX 3090 GPU.
Software Dependencies	No	The paper mentions "Py Torch (Paszke et al. 2019)", "EVA-Vi T-g/14 (Fang et al. 2023)", and "Vicuna-7B (Chiang et al. 2023)". While these are software/models, specific version numbers (e.g., PyTorch 1.9, Python 3.x, CUDA 11.x) for the libraries and environment are not provided.
Experiment Setup	Yes	All hidden size is 512. Both training and inference are implemented with Py Torch (Paszke et al. 2019) on RTX 3090 GPU. We apply EVA-Vi T-g/14 (Fang et al. 2023) and Vicuna-7B (Chiang et al. 2023) as image encoder and LLM, respectively. The above models without the proposed GPTA and SSFEC constitute our baseline. The head and layer numbers are set to 8 and 2 for Input Representation step, and to 8 and 4 for Self-supervised Fusion Change Encoding step on the four datasets, respectively. During training, We use Adam optimizer (Kingma and Ba 2014) to minimize the aforementioned losses and all parameters except MCA adapter are frozen.