ENCODER: Entity Mining and Modification Relation Binding for Composed Image Retrieval
Authors: Zixu Li, Zhiwei Chen, Haokun Wen, Zhiheng Fu, Yupeng Hu, Weili Guan
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on four benchmark datasets demonstrate the superiority of our proposed method. |
| Researcher Affiliation | Academia | 1School of Software, Shandong University 2School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen) 3School of Data Science, City University of Hong Kong EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology using textual explanations and mathematical formulations, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Moreover, we have released our codes to facilitate other researchers1. 1https://sdu-l.github.io/ENCODER.github.io/ |
| Open Datasets | Yes | Following previous works, we chose four benchmark datasets for evaluation, including three fashion-domain datasets, Fashion IQ (Wu et al. 2021), Shoes (Guo et al. 2018), Fashion200K (Han et al. 2017) and an open-domain dataset CIRR (Liu et al. 2021b). |
| Dataset Splits | No | The paper mentions evaluation metrics for different datasets (e.g., R@k for Shoes and Fashion200K, R@10, R@50 for Fashion IQ), but it does not explicitly state the training, validation, or test splits for these datasets. For example, it mentions a batch size but not the dataset split ratios or counts. |
| Hardware Specification | Yes | All experiments were conducted on a single NVIDIA Tesla T4 GPU with 16GB memory and trained 10 epochs. |
| Software Dependencies | No | ENCODER is built upon the pretrained CLIP (Radford et al. 2021) (Vi T-B/32 version). We trained ENCODER using the Adam W optimizer... The paper mentions CLIP and its version (ViT-B/32) and the Adam W optimizer, but it does not specify versions for other ancillary software like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | ENCODER is built upon the pretrained CLIP (Radford et al. 2021) (Vi T-B/32 version). We trained ENCODER using the Adam W optimizer with the initial learning rate of 5e-5, while the batch size is set to 128 and the learning rate for CLIP is 1e-6. Empirically, we maintained a consistent embedding dimension D of 512 throughout the network. We set the latent factor number P to 4 and the query number E of LRQ to 3. We also adopt the temperature factor τ to 0.1 for Eqn.(9,13,14). Through a comprehensive grid search, we set κ = 0.8, γ = 0.5, and µ = 0.5 for all four datasets. All experiments were conducted on a single NVIDIA Tesla T4 GPU with 16GB memory and trained 10 epochs. |