reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Find and Perceive: Tell Visual Change with Fine-Grained Comparison

Authors: Feixiao Lv, Rui Wang, Lihua Jing, Lijun Liu

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we conduct extensive experiments on four change captioning datasets, and experimental results show that our proposed method F&P outperforms existing change caption methods and achieves new state-of-the-art performance.
Researcher Affiliation	Academia	1Institute of Information Engineering, CAS, Beijing, China 2School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China EMAIL
Pseudocode	No	The paper describes its methodology through text and a block diagram (Figure 2), but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code, a link to a code repository, or mention of code in supplementary materials.
Open Datasets	Yes	We perform our main evaluation on two commonly used datasets, Birds-to-Words dataset [Forbes et al., 2019] and CLEVR-Change [Park et al., 2019] to verify the effectiveness of our method. In addition, we also compare our method with other methods on two additional datasets, Spot-the-Diff [Jhamtani and Berg-Kirkpatrick, 2018] and Image Editing-Request [Tan et al., 2019] to verify the generality of our method.
Dataset Splits	No	The paper mentions that 'Early-stop is applied on the main metric to avoid overﬁtting' which implies a validation set, but it does not specify explicit percentages or counts for training, validation, and test splits for any of the datasets used.
Hardware Specification	No	The paper describes model components like ResNet101, Transformer blocks, attention heads, and layer numbers, but does not specify any particular hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies	No	The paper mentions tools and models like 'word2vec [Mikolov et al., 2013]', 'ResNet101 [He et al., 2016]', 'Transformer blocks [Vaswani et al., 2017]', and 'CLIP features [Guo et al., 2022]', but does not provide specific version numbers for these or other software libraries/frameworks used for implementation.
Experiment Setup	Yes	For Transformer blocks, the attention head is set to 8, and layer number is set to 3 for multi-layer Transformer, 2 for ﬁne-grained feature learning, 2 for different enhancement. To ensure stable and progressively reﬁned pseudo label selection, we apply ﬁxed thresholds to attention weights in each iteration (0.04 in the ﬁrst and 0.06 in the second). These threshold values are determined based on experimental performance. ... In the ﬁne-tuning stage, the learning rate is set as 3e-5. Early-stop is applied on the main metric to avoid overﬁtting.