Learning Spatial-Semantic Features for Robust Video Object Segmentation

Authors: Xin Li, Deshui Miao, Zhenyu He, Yaowei Wang, Huchuan Lu, Ming-Hsuan Yang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results show that the proposed method achieves state-of-the-art performance on benchmark data sets, including the DAVIS2017 test (87.8%), Youtube VOS 2019 (88.1%), MOSE val (74.0%), and LVOS test (73.0%), and demonstrate the effectiveness and generalization capacity of our model.
Researcher Affiliation Academia 1Pengcheng Laboratory 2Harbin Institute of Technology, Shenzhen 3 Pazhou Lab (Huangpu) 4Dalian University of Technology 5University of California at Merced 6Yonsei University
Pseudocode No The paper describes the algorithm using text and mathematical formulas but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes The source code and trained models are released at https://github.com/yahooo-m/S3.
Open Datasets Yes We evaluate the proposed method extensively on five benchmarks, including DAVIS 2017 (Pont-Tuset et al., 2017), Youtube VOS2018 (Xu et al., 2018), Youtube VOS2019, LVOS (Hong et al., 2023), and MOSE (Ding et al., 2023).
Dataset Splits No The paper describes how training data is composed (e.g., MEGA dataset construction, DAVIS expanded five times) and sampling strategies for training frames/targets, but does not provide specific train/validation/test splits for the benchmark datasets in a way that would allow direct reproduction of the data partitioning used for their experiments.
Hardware Specification Yes All our models are trained on a machine with 8 x NVIDIA V100 GPUs and tested using one NVIDIA V100 GPU.
Software Dependencies No The paper mentions "Py Torch" for implementing the network but does not specify a version number or list any other software dependencies with version numbers.
Experiment Setup Yes For optimization, the Adam W (Kingma & Ba, 2014) optimizer is used with a learning rate of 5e-5 and a weight decay of 0.5. We train the model for 125K with a batch size of 16 on the video dataset, and 195k on the MEGA dataset. ... Table 5: Training parameters. optimizer Adam W base learning rate 5e-5 weight decay 0.05 droppath rate 0.15 batch size 16 num ref frames 3 num frames 8 max-skip [5, 10, 15, 5] max-skip-itr [0.1,0.3,0.8,1] Iterations 150,000 190,000 learning rate schedule steplr