Learning Spatial-Semantic Features for Robust Video Object Segmentation
Authors: Xin Li, Deshui Miao, Zhenyu He, Yaowei Wang, Huchuan Lu, Ming-Hsuan Yang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results show that the proposed method achieves state-of-the-art performance on benchmark data sets, including the DAVIS2017 test (87.8%), Youtube VOS 2019 (88.1%), MOSE val (74.0%), and LVOS test (73.0%), and demonstrate the effectiveness and generalization capacity of our model. |
| Researcher Affiliation | Academia | 1Pengcheng Laboratory 2Harbin Institute of Technology, Shenzhen 3 Pazhou Lab (Huangpu) 4Dalian University of Technology 5University of California at Merced 6Yonsei University |
| Pseudocode | No | The paper describes the algorithm using text and mathematical formulas but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code and trained models are released at https://github.com/yahooo-m/S3. |
| Open Datasets | Yes | We evaluate the proposed method extensively on five benchmarks, including DAVIS 2017 (Pont-Tuset et al., 2017), Youtube VOS2018 (Xu et al., 2018), Youtube VOS2019, LVOS (Hong et al., 2023), and MOSE (Ding et al., 2023). |
| Dataset Splits | No | The paper describes how training data is composed (e.g., MEGA dataset construction, DAVIS expanded five times) and sampling strategies for training frames/targets, but does not provide specific train/validation/test splits for the benchmark datasets in a way that would allow direct reproduction of the data partitioning used for their experiments. |
| Hardware Specification | Yes | All our models are trained on a machine with 8 x NVIDIA V100 GPUs and tested using one NVIDIA V100 GPU. |
| Software Dependencies | No | The paper mentions "Py Torch" for implementing the network but does not specify a version number or list any other software dependencies with version numbers. |
| Experiment Setup | Yes | For optimization, the Adam W (Kingma & Ba, 2014) optimizer is used with a learning rate of 5e-5 and a weight decay of 0.5. We train the model for 125K with a batch size of 16 on the video dataset, and 195k on the MEGA dataset. ... Table 5: Training parameters. optimizer Adam W base learning rate 5e-5 weight decay 0.05 droppath rate 0.15 batch size 16 num ref frames 3 num frames 8 max-skip [5, 10, 15, 5] max-skip-itr [0.1,0.3,0.8,1] Iterations 150,000 190,000 learning rate schedule steplr |