reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Ranking-aware adapter for text-driven image ordering with CLIP

Authors: Wei-Hsiang Yu, Yen-Yu Lin, Ming-Hsuan Yang, Yi-Hsuan Tsai

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on four tasks spanning diverse numerical concepts, including facial aging estimation (Samek et al., 2017), object count sorting (Singh et al., 2024), image quality/aesthetics assessment (Hosu et al., 2020; Murray, Naila and Marchesotti, Luca and Perronnin, Florent, 2012), and dating historical colored images (Palermo et al., 2012). Our approach consistently performs favorably against CLIP baselines and state-of-the-art methods in terms of ranking and retrieval qualities, even though these competing methods are fine-tuned for target tasks.
Researcher Affiliation	Collaboration	Wei-Hsiang Yu1 Yen-Yu Lin1 Ming-Hsuan Yang2 Yi-Hsuan Tsai3 1National Yang Ming Chiao Tung University 2UC Merced 3Atmanity Inc.
Pseudocode	No	The paper includes architectural diagrams (Figure 2, Figure 3) but does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The source code is available at https://github.com/uynaes/Ranking Aware CLIP
Open Datasets	Yes	Facial age estimation. Facial age estimation predicts the age of a given face. We use the Adience dataset (Samek et al., 2017), which includes 13,027 images labeled across 8 age groups, following the data split from Wang et al. (2023d). Historical colored image dating. The historical colored image dataset (Palermo et al., 2012) is a widely-used benchmark for predicting the decade of a given historical colored image, consisting of 1,325 images labeled across 5 decades ranging from the 1930s to the 1970s. Image quality and aesthetics assessment. For ranking images based on subjective preference and objective properties, we employ the Kon IQ-10k dataset (Hosu et al., 2020) for assessing image qualities and the Aesthetic Visual Analysis (AVA) dataset (Murray, Naila and Marchesotti, Luca and Perronnin, Florent, 2012) for evaluating image aesthetics. Object count sorting. ... The COCO-REM dataset (Singh et al., 2024), an annotation revised version of the COCO dataset, serves as the test bed. Table 7: Facial Age Estimation Results on the UTKFace Dataset. ... UTKFace dataset (Zhang et al., 2017). A.2 OBJECT COUNT SORTING ON ON CLEVR DATASET. The CLEVR dataset (Johnson et al., 2017). As shown in Figure 9, we evaluate our ranking adapter on unseen categories using the LVIS dataset (Gupta et al., 2019). As shown in Figure 10, we sample five images from the highest to lowest MOS at the same interval, finding that our model shows high agreement with subjective scores. AGIQA-3k dataset (Li et al., 2023a).
Dataset Splits	Yes	Facial age estimation. ...following the data split from Wang et al. (2023d). Historical colored image dating. ...We follow the standard ordinal regression setting as that in Wang et al. (2023d). Image quality and aesthetics assessment. ...We evaluate models using the official splits, which have 2,015 and 19,930 test images in the Kon IQ-10k and AVA datasets, respectively. Object count sorting. ...comprising 118,287 training and 5,000 testing images. A.1 FACIAL AGE ESTIMATION ON THE UTKFACE DATASET. ...Following the preprocessing and data split described in Kuprashevich & Tolstykh (2023), we train our ranking-aware adapter for 20k steps with a batch size of 64, a learning rate of 5e 5, and a weight decay of 0.01.
Hardware Specification	Yes	We conduct all experiments on one NVIDIA RTX-3090Ti GPU.
Software Dependencies	No	The paper mentions using 'Open CLIP' and optimizing with 'Smooth L1 loss', 'Hinge loss', and 'Adam W optimizer' but does not specify version numbers for these or other software components like programming languages or libraries.
Experiment Setup	Yes	We implement the ranking adapter upon the Open CLIP (Ilharco et al., 2021) framework and optimize the model using a combination of Smooth L1 loss and Hinge loss with an Adam W optimizer at a learning rate of 1e 5, weight decay of 0.01, and batch size of 64. We use 220k steps for object counts sorting and 144k steps for image-quality assessment, facial age estimation, and dating historical image tasks. ...where α is a hyperparameter (set to 0.2 in our experiments) to balance the importance of the regression and ranking objectives. We take a random horizontal flipping as data augmentation and resize the images to 320 x 320 without cropping.