reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Advancing Comprehensive Aesthetic Insight with Multi-Scale Text-Guided Self-Supervised Learning

Authors: Yuti Liu, Shice Liu, Junyuan Gao, Peng-tao Jiang, Hao Zhang, Jinwei Chen, Bo Li

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The empirical evidence indicates that accompanied with extensive instruct-tuning, our model sets new state-of-the-art benchmarks across multiple tasks, including aesthetic scoring, aesthetic commenting, and personalized image aesthetic assessment.
Researcher Affiliation	Industry	vivo Mobile Communication Co., Ltd, Shanghai, China EMAIL; EMAIL
Pseudocode	No	The paper describes the architecture of CALM and its training process in textual descriptions and diagrams (Figures 2, 3, 4) but does not include a dedicated pseudocode or algorithm block.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code or a link to a code repository.
Open Datasets	Yes	Self-Supervised Pre-Training encourages the three Qformers in the MFAM to learn aesthetic attributes in a self-supervised manner, utilizing unlabeled images from diverse sources, including AVA (Murray et al. 2012), AADB (Kong et al. 2016), EVA (Kang et al. 2020), ICAA, PCCD (Chang et al. 2017), pexels (Pfister et al. 2021), SPAQ (Fang et al. 2020) and TAD66K (He et al. 2022). The training data comprises a 558K subset of LAION-CC-SBU (Schuhmann et al. 2022; Changpinyo et al. 2021; Saleh and Elgammal 2015) and Share GPT4V (Chen et al. 2023).
Dataset Splits	Yes	The AVA dataset comprises over 250,000 images with scores rated by users on the DPChallenge website. We used the official split, designating 19,928 images as the test set and the remainder for training. The AVA-Captions dataset contains approximately 230,000 images, each with an average of 5 user comments. To prevent data leakage, images from the AVA test set are excluded from AVA-Captions training, resulting in 210,000 images for training and 9,361 for testing. FLICKR-AES includes 35,263 images rated by 173 annotators in the training set and 4,737 images evaluated by 37 annotators in the test set, along with user identifications.
Hardware Specification	Yes	Training was conducted on eight 80GB A100 GPUs, utilizing the Adam optimizer (Kingma and Ba 2014).
Software Dependencies	No	The paper mentions models like Vicuna-7B and GPT-3.5, and the Adam optimizer, but does not provide specific version numbers for software dependencies such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, CUDA versions) required to replicate the experiments.
Experiment Setup	Yes	The peak learning rate was set to 1e3 for the pre-training stage, and 2.5e-5 and 7e-5 for the two processes in the fine-tuning stage, respectively. Both stages commenced with a linear warm-up, followed by a cosine annealing schedule (Loshchilov and Hutter 2016), with durations of 5 hours and 16.5 hours, respectively.