reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

TextRegion: Text-Aligned Region Tokens from Frozen Image-Text Models

Authors: Yao Xiao, Qiqian Fu, Heyi Tao, Yuqun Wu, Zhen Zhu, Derek Hoiem

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive evaluations and consistently achieve superior or competitive performance compared to state-of-the-art training-free methods. 4 Experiments We show the strong region classification capabilities of Text Region in Sec. 4.1
Researcher Affiliation	Academia	Siebel School of Computing and Data Science University of Illinois at Urbana-Champaign EMAIL
Pseudocode	No	The paper describes the methods using mathematical formulas and descriptive text in sections like '3.2 Text Region Approach' and 'A.3 CLIP Variants', but does not contain explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at: https://github.com/avaxiao/Text Region.
Open Datasets	Yes	We evaluate on six widely used semantic segmentation benchmarks: PASCAL VOC 2012 (Everingham et al., 2015), PASCAL Context (Mottaghi et al., 2014), COCO-Stuff (Caesar et al., 2018), COCOObject (Lin et al., 2014), Cityscapes (Cordts et al., 2016), ADE20K (Zhou et al., 2019).
Dataset Splits	No	The paper evaluates on well-known benchmarks (e.g., PASCAL VOC 2012, COCO, Ref COCO) for which standard splits are typically used. However, it does not explicitly state the specific training, validation, and test split percentages, sample counts, or direct citations for the splits within the main text.
Hardware Specification	Yes	This work used NVIDIA GPUs at NCSA Delta through allocation CIS240059 and CIS250059 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program... measured on one A100 GPU.
Software Dependencies	No	The paper mentions using SAM2 (Ravi et al., 2024) with a Hiera-Large backbone and specific configurations, but it does not provide specific version numbers for general software dependencies like Python, PyTorch, or CUDA, which are crucial for reproducibility.
Experiment Setup	Yes	For all experiments, we filter global patches using a threshold of τ 0.07. The crop size is uniformly set to 336 for all CLIP models (Vi T-B/16 through Vi T-H/14), while Sig LIP2 and Perception Encoder use their respective default input resolutions. Region masks are generated with SAM2 (Ravi et al., 2024) Hiera-Large, using the following configuration: pred-iou-thresh set to 0.6, stability-score-thresh to 0.6, box-nms-thresh to 0.9, and points-per-side to 16. In the semantic segmentation experiments on the Cityscapes dataset, we increase points-per-side to 36 to due to its high resolution and the abundance of small objects. To mitigate the impact of duplicated or overlapping masks, we also merge masks with an overlap Io U greater than 0.8.