reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Semi-Supervised Blind Quality Assessment with Confidence-quantifiable Pseudo-label Learning for Authentic Images

Authors: Yan Zhong, Chenxi Yang, Suyuan Zhao, Tingting Jiang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on authentically distorted image databases are conducted to validate the applicability and effectiveness of the proposed method. In this section, we conduct experiments on authentically distorted images to validate the superiority of CPL-IQA. We evaluate BIQA models by four typical metrics, including Pearson Linear Correlation Coefficient (PLCC), Spearman Rank-order Correlation Coefficient (SRCC), Kendall Rank-order Correlation Coefficient (KRCC), and Root Mean Squared Error (RMSE). We investigate the impacts of different components of CPL-IQA. Firstly, we record the impact of each stage on the final results on Kon IQ-10k and SPAQ with different split ratios, which are shown in Table 3.
Researcher Affiliation	Academia	1School of Mathematical Sciences, Peking University, Beijing, China 2State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University, Beijing, China 3Department of Computer Science and Technology, Tsinghua University, Beijing, China 4National Biomedical Imaging Center, Peking University, Beijing, China. Correspondence to: Yan Zhong <EMAIL>, Tingting Jiang <EMAIL>.
Pseudocode	Yes	The pseudo-code of CPL-IQA is summarized in Algorithm 1 in Appendix C, where line 1, lines 1-1. lines 1-1 and lines 11 are the processes of Label Conversion, Stage 1, Step 1 and Step 2 of Stage 2, respectively.
Open Source Code	No	The paper does not contain any explicit statement about code release, a link to a code repository, or mention of code provided in supplementary materials.
Open Datasets	Yes	We perform the main experiments on four representative authentically distorted image databases, including Kon IQ10K (Hosu et al., 2020), LIVE-C (Ghadiyaram & Bovik, 2015), NNID (Xiang et al., 2019) and SPAQ (Fang et al., 2020). In addition to the main experiments, more databases in Table 9 are used in more extra experiments in Appendix E, including BID (Ciancio et al., 2010) and KADID-10K (Lin et al., 2019).
Dataset Splits	Yes	We divide the image set of Kon IQ-10K by 1:3:1, which corresponds to the ratio of the number of training images with labels, training images without labels and test images, respectively, to conduct the comparative experiments to analyze the effectiveness of CPL-IQA. In addition, we study the impact of cardinal number m of the score set M mentioned in Section 3.3 and the split ratio of image samples, the results of which are shown in Table 4 and Table 5. Table 5. Impact of the split ratio of datasets (fixed m = 20). Kon IQ-10K Ratio 1:7:2 2:6:2 3:5:2. SPAQ Ratio 1:8:1 2:7:1 3:6:1. Specifically, during the training process, we consistently used 20% of the Kon IQ-10k dataset as labeled training samples and an additional 20% as test samples.
Hardware Specification	Yes	Our CPL-IQA is trained with Pytorch library on two Intel Xeon E5-2609 v4 CPUs and four NVIDIA RTX 2080Ti GPUs.
Software Dependencies	No	Our CPL-IQA is trained with Pytorch library on two Intel Xeon E5-2609 v4 CPUs and four NVIDIA RTX 2080Ti GPUs. The graph G in Eq. 3 is computed with the FAISS library (Johnson et al., 2019). The paper mentions PyTorch and FAISS libraries but does not specify their version numbers.
Experiment Setup	Yes	In the CPL-IQA, we choose Res Net101 as the backbone... k is set to 10 and σ = 500 in the Nearest Neighbor Graph Construction with k NN... We let γ = 0.99 in the process of Label Optimizing. The dimension d of features extracted by the FC layer after the backbone is 256... The batch size B = 64 in Stage 1, and Stage 2 is performed with B = BL + BU, where BL = 8 and BU = 56 denote the number of labeled and unlabeled images in one batch. The training is conducted for just 10 epochs in total with SGD optimization, including 5 epochs in Stage 1 and 5 epochs in Stage 2. Meanwhile, we resize all the images into 256 256 and randomly crop 10 sub-images to the size of 224 224, and we initialize the backbone by the pre-training weights obtained by the classification task on Image Net (Deng et al., 2009) before training in Stage 1.