reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation

Authors: Minhyun Lee, Seungho Lee, Song Park, Dongyoon Han, Byeongho Heo, Hyunjung Shim

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate that Mask RIS can easily be applied to various RIS models, outperforming existing methods in both fully supervised and weakly supervised settings. Finally, Mask RIS achieves new state-of-the-art performance on Ref COCO, Ref COCO+, and Ref COCOg datasets.
Researcher Affiliation	Collaboration	Minhyun Lee EMAIL AI Center, Samsung Electronics Seungho Lee EMAIL AI Center, Samsung Electronics Song Park EMAIL Dongyoon Han EMAIL NAVER AI Lab Byeongho Heo EMAIL NAVER AI Lab Hyunjung Shim EMAIL Korea Advanced Institute of Science & Technology (KAIST)
Pseudocode	No	The paper describes the Mask RIS framework, input masking strategy, and Distortion-aware Contextual Learning using mathematical formulations and textual descriptions, but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/naver-ai/maskris.
Open Datasets	Yes	Datasets. We evaluate our method using three popular benchmarks in RIS: Ref COCO, Ref COCO+, and Ref COCOg. Ref COCO (Yu et al., 2016), all built on the MS COCO dataset (Lin et al., 2014). ... To further examine cross-dataset generalization beyond COCO-style images, we additionally evaluate Mask RIS on Ref Clef, a subset of the Image CLEF dataset with more diverse natural scenes and object categories (see Appendix A.1 for details).
Dataset Splits	Yes	We evaluate our method using three popular benchmarks in RIS: Ref COCO, Ref COCO+, and Ref COCOg. ... On Ref COCO+, our method leads by 1.37%p, 2.76%p, and 1.93%p on the validation, test A, and test B splits, respectively. Even on the challenging Ref COCOg dataset, Mask RIS still outperforms CARIS by 0.89%p and 1.05%p on the validation and test splits. ... For this dataset [Ref COCOg], we report results on the UMD partition (Yu et al., 2016), following the previous studies (Wang et al., 2022; Liu et al., 2023b; Kim et al., 2023b).
Hardware Specification	No	The paper mentions that "Some parts of experiments are based on the NAVER Smart Machine Learning NSML (Kim et al., 2018) platform." in the Acknowledgments. However, it does not specify any details about the CPU, GPU models, memory, or other specific hardware configurations used for the experiments.
Software Dependencies	No	Most of our experimental results are based on CARIS (Liu et al., 2023c). For the image encoder, we used the Swin-Base Transformer (Liu et al., 2021b), pre-trained on Image Net-22k (Deng et al., 2009), and for the text encoder, we employed BERT-Base (Devlin et al., 2018). The maximum length of the text is set to 20 words. We used the Adam W (Loshchilov & Hutter, 2017) optimizer with a weight decay of 0.01. ... While various software components like Swin-Base Transformer, BERT-Base, and Adam W are mentioned, specific version numbers for these or other critical software dependencies (e.g., Python, PyTorch, CUDA) are not provided.
Experiment Setup	Yes	Designed as a plug-and-play training strategy, we strictly follow the original training settings and hyperparameters, such as learning rate, epochs, and batch size, without modification. Notably, we primarily implemented our method on CARIS (Liu et al., 2023c), a leading So TA method, unless stated otherwise. Images are resized to 448 × 448 for both training and testing. For image masking, we set 32 as the patch size. ... We used the Adam W (Loshchilov & Hutter, 2017) optimizer with a weight decay of 0.01. We applied different learning rates of 1e 5 and 1e 4 to encoders and the others, respectively, with a polynomial learning rate schedule with a power of 0.9. The model was trained for 50 epochs with a batch size of 16, and the input images were resized to 448 × 448. ... While our default setting uses λ = 0.5, we also provide a detailed sensitivity analysis of this ratio in Appendix A.5, showing that Mask RIS is robust across a wide range of choices.