Robust and Consistent Online Video Instance Segmentation via Instance Mask Propagation

Authors: Miran Heo, Seoung Wug Oh, Seon Joo Kim, Joon-Young Lee

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments Datasets You Tube-VIS. You Tube-VIS (Yang, Fan, and Xu 2019) is a standard benchmark dataset for VIS in three versions (2019/2021/2022). Each version is designed to segment objects from 40 predefined categories within videos. ... Quantitative Results You Tube-VIS 2019 & 2021. Due to space constraints, detailed results are provided in the supplementary material. ... Ablation Study In this section, we analyze the main components of Ro Co VIS and evaluate their impact in Tab. 2.
Researcher Affiliation Collaboration Miran Heo1*, Seoung Wug Oh2, Seon Joo Kim1, Joon-Young Lee2 1Yonsei University 2Adobe Research
Pseudocode No The paper describes its methodology through textual explanations and mathematical equations (e.g., Eq. 1-4) within the 'Method' section, but it does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor structured code-like procedures.
Open Source Code No The paper does not contain any explicit statements about releasing source code for the methodology described, nor does it provide any links to a code repository.
Open Datasets Yes Datasets You Tube-VIS. You Tube-VIS (Yang, Fan, and Xu 2019) is a standard benchmark dataset for VIS in three versions (2019/2021/2022). ... OVIS. Occluded-VIS (OVIS) dataset (Qi et al. 2021) has been introduced to specifically address the challenging scenario of heavy occlusions between objects. ... HQ-YTVIS. ... HQ-YTVIS (Ke et al. 2022b) refines mask annotation of You Tube-VIS 2019 ... VIPSeg. We utilize VIPSeg (Miao et al. 2022), introduced for Video Panoptic Segmentation (VPS) (Kim et al. 2020).
Dataset Splits Yes You Tube-VIS 2022. Tab. 1 showcases our performance on the challenging benchmark, the long video validation split of You Tube-VIS 2022. ... OVIS. We also present our results on the OVIS dataset in Tab. 1, which features highly occluded instances across long videos. ... HQ-YTVIS & VIPSeg-things. We demonstrate that Ro Co VIS produces high-quality consistent mask outputs in Tab. 3. ... VIPSeg. We follow the original data split, while simply converting the things annotations into VIS annotations.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions using backbones such as Swin-L and Res Net-50 and frameworks like Transformer-based architectures, but it does not specify any software libraries, frameworks, or programming languages with their version numbers.
Experiment Setup No The paper describes conceptual aspects of training and inference, including modifications to the UVLA criterion, but it does not provide specific experimental setup details such as hyperparameters (e.g., learning rate, batch size, epochs, optimizer settings) in the main text.