reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Order-aware Interactive Segmentation

Authors: Bin Wang, Anwesa Choudhuri, Meng Zheng, Zhongpai Gao, Benjamin Planche, Andong Deng, Qin Liu, Terrence Chen, Ulas Bagci, Ziyan Wu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that OIS achieves state-of-the-art performance, improving m Io U after one click by 7.61 on the HQSeg44K dataset and 1.32 on the DAVIS dataset as compared to the previous state-of-the-art Seg Next, while also doubling inference speed compared to current leading methods. We evaluate our method and comparison methods on two widely used benchmarks for interactive segmentation: HQSeg44K (Ke et al., 2024) and DAVIS (Perazzi et al., 2016). Table 1: Performance comparison on the HQSeg44K benchmark. Table 4: Ablation experiments on DAVIS.
Researcher Affiliation	Collaboration	1Northwestern University, Chicago, IL, USA 2United Imaging Intelligence, Boston, MA, USA 3University of Central Florida, Orlando, FL, USA 4University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Pseudocode	No	The paper describes the methodology in text and uses diagrams, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	To ensure the reproducibility of our work, we extensively describe the implementation details, such as the pre-trained model, training data, and other important training hyperparameters in Sec 3, Sec 4.1 and Sec A.1. We will release all source code pending release approval.
Open Datasets	Yes	We evaluate our method and comparison methods on two widely used benchmarks for interactive segmentation: HQSeg44K (Ke et al., 2024) and DAVIS (Perazzi et al., 2016).
Dataset Splits	No	We evaluate our method and comparison methods on two widely used benchmarks for interactive segmentation: HQSeg44K (Ke et al., 2024) and DAVIS (Perazzi et al., 2016). More details about these two datasets can be found in Appendix A.2. To be consistent with previous works, we use a subset of 345 frames to conduct the evaluation.
Hardware Specification	Yes	We use Adam optimizer to train our model on HQSeg44K dataset for 15 epochs on two A100 GPUs.
Software Dependencies	No	The paper mentions using an "Adam optimizer" and a "frozen Vi T-Base encoder from Depth Anything V2" but does not provide specific version numbers for these or other key software components (e.g., PyTorch, TensorFlow, specific Python libraries).
Experiment Setup	Yes	The input image size is set to 1024. We set the maximum number of clicks N as 48 during training, with the first 24 for positive clicks and the remaining for negative. We use three same blocks of object-level and order-level understanding modules (Sec. 3.2 and Sec. 3.3) to conduct prompt fusion. In these modules, the Feed-Forward Network (FFN) is implemented as a 2-layer Multi-Layer Perceptron (MLP). Each attention module is followed by a Layer Norm, which normalizes the sparse embeddings. We use Adam optimizer to train our model on HQSeg44K dataset for 15 epochs on two A100 GPUs.