reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CLIP-PCQA: Exploring Subjective-Aligned Vision-Language Modeling for Point Cloud Quality Assessment

Authors: Yating Liu, Yujie Zhang, Ziyu Shan, Yiling Xu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that our CLIP-PCQA outperforms other State-Of-The-Art (SOTA) approaches. We conduct comprehensive experiments on multiple benchmarks. Experimental results indicate that CLIP-PCQA achieves superior performance and further analyses reveal the model s robustness under different settings.
Researcher Affiliation	Academia	Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China EMAIL
Pseudocode	No	The paper describes the proposed method using text, mathematical formulations, and diagrams, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code https://github.com/Olivialyt/CLIP-PCQA
Open Datasets	Yes	To illustrate the effectiveness of our method, we employ three benchmarks with available raw opinion scores: SJTU-PCQA (Yang et al. 2020a), LS-PCQA Part I (Liu et al. 2023b) and BASICS (Ak et al. 2024).
Dataset Splits	Yes	We partition the databases according to content (reference point clouds) and k-fold cross-validation is used for training. Specifically, 9-fold cross-validation is applied for SJTU-PCQA following (Zhang et al. 2022b), and we adopt a 5-fold cross-validation both for LS-PCQA and BASICS. For each fold, the test performance with minimal training loss is recorded and the average result across all folds is recorded to alleviate randomness.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models used for the experiments. It only mentions general training strategies.
Software Dependencies	No	The paper mentions using a 'Vision Transformer (Vi T-B/16)' and 'Adam optimizer', but does not provide specific version numbers for software libraries, frameworks, or programming languages used.
Experiment Setup	Yes	The initial learning rate is set as 4e-6 and the model is trained for 50 epochs with a default batch size of 16. We use the Adam optimizer (Kingma and Ba 2014) with a weight decay of 1e-4. The number of projection views M = 6 and the images are randomly cropped into 224 224 3 as inputs. We set the number of context tokens W as 16. For the loss function, we set θ = [0.25, 0.50, 0.75]. α is set to 1/K and β is set to 0.08. Depending on the raw score ranges of different databases, we evenly divide them into five thresholds as the quantitative values q. For example, we set q = [5, 4, 3, 2, 1] for LS-PCQA and q = [10, 8, 6, 4, 2] for SJTU-PCQA, respectively.