reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ZoRI: Towards Discriminative Zero-Shot Remote Sensing Instance Segmentation

Authors: Shiqi Huang, Shuting He, Bihan Wen

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We establish new experimental protocols and benchmarks, and extensive experiments convincingly demonstrate that Zo RI achieves the state-of-art performance on the zero-shot remote sensing instance segmentation task. Experiments Experimental Setup Datasets We establish two remote sensing zero-shot instance segmentation benchmarks with i SAID (Zamir et al. 2019) and NWPU-VHR-10 (Cheng et al. 2014; Su et al. 2019) datasets.
Researcher Affiliation	Academia	Shiqi Huang1, Shuting He2, Bihan Wen1 1Nanyang Technological University 2Shanghai University of Finance and Economics EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes methods and formulations in prose and mathematical equations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code https://github.com/Huang Shiqi128/Zo RI
Open Datasets	Yes	We establish two remote sensing zero-shot instance segmentation benchmarks with i SAID (Zamir et al. 2019) and NWPU-VHR-10 (Cheng et al. 2014; Su et al. 2019) datasets.
Dataset Splits	No	i SAID dataset is divided into 11 seen classes and 4 unseen classes ( tennis court , helicopter , swimming pool and soccer ball field ), which has the same seen/unseen split for DOTA (Zang et al. 2024; Xia et al. 2018), and NWPUVHR-10 dataset is split into 7 seen classes and 3 unseen classes ( ship , basketball court and harbor ). For the training set, only images containing seen class objects are selected, while any images with unseen classes are excluded to avoid information leakage. Dataset details can be found in the supplementary material (Huang, He, and Wen 2024). The paper specifies class splits and data exclusion criteria but does not provide specific train/validation/test image splits within the main text.
Hardware Specification	Yes	All experiments are conducted with one RTXA5000 GPU.
Software Dependencies	No	The proposed method is developed based on FC-CLIP (Yu et al. 2023). We use the LAION-2B pretrained Conv Next-Large (Liu et al. 2022) from Open CLIP (Ilharco et al. 2021) as the feature extractor. The mask generator follows Mask2Former (Cheng et al. 2022) with object query number set to 300. Prompt templates for RESISC45 (Cheng, Han, and Lu 2017) used in CLIP (Radford et al. 2021) are employed to obtain text embeddings with the pretrained CLIP text encoder. The paper mentions several software frameworks and models used, but it does not specify explicit version numbers for these dependencies (e.g., PyTorch version, Python version).
Experiment Setup	Yes	We train the model for 50 epochs with training batch size 2. Input images are resized to 512 512 during training. Hyperparameters λ and α are set to 0.7, 0.5, respectively. Instance number T and trainable channels in KMA is empirically set to 1 and 32. The model is optimized using Adam W optimizer. The learning rate is set to 1.25 10 5.