reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

OpenVIS: Open-vocabulary Video Instance Segmentation

Authors: Pinxue Guo, Hao Huang, Peiyang He, Xuefeng Liu, Tianjun Xiao, Wenqiang Zhang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental results demonstrate the proposed Inst Former achieve state-of-the-art capabilities on a comprehensive Open VIS evaluation benchmark, while also achieves competitive performance in fully supervised VIS task.
Researcher Affiliation	Collaboration	Pinxue Guo1,2*, Hao Huang2, Peiyang He2, Xuefeng Liu2, Tianjun Xiao2, Wenqiang Zhang1,3 1Academy for Engineering and Technology, Fudan University 2Amazon Web Services 3 School of Computer Science, Fudan University
Pseudocode	No	The paper describes the methods and framework components (Inst Former, Inst CLIP, Universal Rollout Association) textually and through architectural diagrams (Figure 2, Figure 3) but does not provide any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code https://github.com/Pinxue Guo/Open VIS
Open Datasets	Yes	Specifically, we evaluate the proposed model on You Tube-VIS, BURST, LVVIS, and UVO datasets, encompassing a large number of novel categories, to comprehensively assess its diverse capacities. However, the training process only see the data of You Tube-VIS, which comprises only 40 categories.
Dataset Splits	Yes	Our Open VIS model is only trained on You Tube-VIS (a widely-used VIS dataset comprising 40 categories). This ensures that the categories present in the training data are small-scale subsets of those found in the test data. More discussion and analysis of the evaluation benchmark can be found in Supplementary.
Hardware Specification	Yes	The whole training is done on 8 V100 GPUs for 3 hours.
Software Dependencies	No	The paper mentions several software components, models, and frameworks such as 'COCO-pretrained Mask2Former', 'Vi T-B/32 of CLIP', and 'Lo RA', but does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	Inst Former is trained using a two-stage approach and CLIP weights are frozen during the entire training. In first stage, the open-world mask proposal network and Inst CLIP (Lo RA adapter) are trained for 6k iterations with LI and instance segmentation loss. Subsequently, we train the rollout tracker in second stage, with all other weights frozen, using LT for an additional 600 iterations.