reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Instruct2See: Learning to Remove Any Obstructions Across Distributions

Authors: Junhang Li, Yu Guo, Chuhua Xian, Shengfeng He

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on both in-distribution and out-of-distribution obstacles show that Instruct2See consistently achieves strong performance and generalization in obstruction removal, regardless of whether the obstacles were present during the training phase. Code and dataset are available at https://jhscut.github.io/Instruct2See.
Researcher Affiliation	Academia	1School of Computer Science and Engineering, South China University of Technology 2School of Computing and Information Systems, Singapore Management University 3School of Navigation, Wuhan University of Technology. Correspondence to: Chuhua Xian and Shengfeng He <EMAIL, EMAIL>.
Pseudocode	Yes	Algorithm 1 Instruct2See Model Inference
Open Source Code	Yes	Code and dataset are available at https://jhscut.github.io/Instruct2See.
Open Datasets	Yes	Code and dataset are available at https://jhscut.github.io/Instruct2See. Datasets. We utilize 3,984 images for model training. For the fence obstacle, we select 897 clear images from the BSD dataset (Martin et al., 2001) and the UCID dataset (Schaefer & Stich, 2003), and generate paired data using the fence synthesis method from (Du et al., 2018). Additionally, 987 clear images from the Flickr24K dataset (Zhang et al., 2018) and 987 flare images from the Flare7K dataset (Dai et al., 2022) are used to create flare image pairs. We also include 2,100 training image pairs from the VRDS dataset (Wu et al., 2023). For unseen obstructions, we sourced 100 test images each from the rain streak dataset (Yang et al., 2017), snowy dataset (Liu et al., 2018), and stroke dataset (Lugmayr et al., 2022).
Dataset Splits	Yes	Datasets. We utilize 3,984 images for model training. For the fence obstacle, we select 897 clear images from the BSD dataset (Martin et al., 2001) and the UCID dataset (Schaefer & Stich, 2003), and generate paired data using the fence synthesis method from (Du et al., 2018). Additionally, 987 clear images from the Flickr24K dataset (Zhang et al., 2018) and 987 flare images from the Flare7K dataset (Dai et al., 2022) are used to create flare image pairs. We also include 2,100 training image pairs from the VRDS dataset (Wu et al., 2023). For testing, we apply the same synthesis strategy to create a fence test dataset with 100 image pairs. Moreover, a flare test dataset with another 100 image pairs is used. Additionally, 500 raindrop test image pairs are included. For unseen obstructions, we sourced 100 test images each from the rain streak dataset (Yang et al., 2017), snowy dataset (Liu et al., 2018), and stroke dataset (Lugmayr et al., 2022).
Hardware Specification	Yes	Our Instruct2See framework is implemented in Py Torch 1.12.0 and trained on a system equipped with 2 AMD EPYC 7543 32-Core CPUs and 8 NVIDIA L40 GPUs.
Software Dependencies	Yes	Our Instruct2See framework is implemented in Py Torch 1.12.0 and trained on a system equipped with 2 AMD EPYC 7543 32-Core CPUs and 8 NVIDIA L40 GPUs. We utilize the CLIP Vi T-B/32 model. For obstructions like rain streaks and snow, which are more challenging to segment, we employ a U-Net-based model (Ronneberger et al., 2015) to generate the initial mask. For other obstructions, we use the Segment Anything Model 2 (SAM2) (Ravi et al., 2024).
Experiment Setup	Yes	Our Instruct2See framework is implemented in Py Torch 1.12.0 and trained on a system equipped with 2 AMD EPYC 7543 32-Core CPUs and 8 NVIDIA L40 GPUs. We train the model using the Adam W optimizer (β1 = 0.9, β2 = 0.999, weight decay of 1 × 10−4) and L1 loss, over 300K iterations. The initial learning rate is set to 3 × 10−4. A progressive learning strategy is employed, starting with a patch size of 128 × 128 and a batch size of 1. The patch size is progressively updated to 128 × 128, 160 × 160, 192 × 192, and 256 × 256 at iterations 115,000, 80,000, 60,000, and 45,000, respectively. We also apply horizontal and vertical flips for data augmentation.