Adaptive Language-Aware Image Reflection Removal Network
Authors: Siyan Fang, Yuntao Wang, Jinpu Zhang, Ziwen Li, Yuehuan Wang
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the model s performance under complex reflections and varying levels of language accuracy, we introduce the Complex Reflection and Language Accuracy Variance (CRLAV) dataset. Experimental results demonstrate that ALANet surpasses state-of-the-art methods for image reflection removal. ... Experiments demonstrate that the proposed ALANet surpasses state-of-the-art (SOTA) methods and achieves solid performance even with inaccurate language inputs. ... Quantitative comparison results across public datasets are shown in Table 1. ... Table 2 presents a quantitative comparison between ALANet and other methods on the CRLAV dataset... Table 4 illustrates the performance of our ALANet under varying degrees of language accuracy. ... Ablation study on LCAM. Table 5 illustrates the contributions of language-guided attention and channel attention to performance within LCAM. ... Ablation study on ALCM. Table 5 showcases the performance of models with and without ALCM. ... Ablation study on LSCT. To demonstrate the effectiveness of LSCT, we conducted experiments by further removing LSCT on top of removing ALCM, with the results shown in Table 5. |
| Researcher Affiliation | Academia | Siyan Fang1 , Yuntao Wang1 , Jinpu Zhang2 , Ziwen Li1 and Yuehuan Wang1 1Huazhong University of Science and Technology 2National University of Defense Technology EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper includes figures describing module structures (Figure 2, 3, 4, 5, 6) but does not contain any explicit pseudocode blocks or algorithms labeled as such. |
| Open Source Code | Yes | The code and dataset are available at https://github.com/fashyon/ALANet. |
| Open Datasets | Yes | To evaluate the model s performance under complex reflections and varying levels of language accuracy, we introduce the Complex Reflection and Language Accuracy Variance (CRLAV) dataset. The code and dataset are available at https://github.com/fashyon/ALANet. ... For synthetic images, we generate data using the popular image captioning dataset Flickr8k [Hodosh et al., 2013]... For the real-world training data, following prior works [Zhong et al., 2024; Hu and Guo, 2023; Zhu et al., 2023], we train our model using 200 image pairs from the Nature dataset [Li et al., 2020] and 90 image pairs from the Real dataset [Zhang et al., 2018]. We use the remaining images from the Nature and Real datasets, along with the three subsets Wild, Postcard, and Solid from the SIR2 dataset [Wan et al., 2017] as public test sets. The CRLAV dataset is also included as a test set. |
| Dataset Splits | Yes | For synthetic images, we generate data using the popular image captioning dataset Flickr8k [Hodosh et al., 2013], which contains 8,091 images, each with five different language descriptions. ... For the real-world training data, following prior works [Zhong et al., 2024; Hu and Guo, 2023; Zhu et al., 2023], we train our model using 200 image pairs from the Nature dataset [Li et al., 2020] and 90 image pairs from the Real dataset [Zhang et al., 2018]. We use the remaining images from the Nature and Real datasets, along with the three subsets Wild, Postcard, and Solid from the SIR2 dataset [Wan et al., 2017] as public test sets. The CRLAV dataset is also included as a test set. |
| Hardware Specification | Yes | The model is trained for 70 epochs using the Adam optimizer [Kingma and Ba, 2014] with a single RTX 3080 Ti GPU. |
| Software Dependencies | No | The paper mentions the use of the Adam optimizer [Kingma and Ba, 2014] and deep learning frameworks implicitly through its network architecture. However, it does not specify version numbers for any key software components like Python, PyTorch, TensorFlow, or CUDA. |
| Experiment Setup | Yes | To balance performance and parameter count, the channel numbers from level 0 to level 4 of the network are set to C0, C1, C2, C3, C4 = [64, 128, 128, 160, 160], and the number of LASBs at each level (N0 to N4) is set to 2. The initial learning rate is 10 4, with a batch size of 1, a patch size of 224 224, and random flipping applied for data augmentation. The model is trained for 70 epochs using the Adam optimizer [Kingma and Ba, 2014] with a single RTX 3080 Ti GPU. The learning rate decreases to 10 5 at 50 epochs. |