CLIP-PCQA: Exploring Subjective-Aligned Vision-Language Modeling for Point Cloud Quality Assessment
Authors: Yating Liu, Yujie Zhang, Ziyu Shan, Yiling Xu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our CLIP-PCQA outperforms other State-Of-The-Art (SOTA) approaches. We conduct comprehensive experiments on multiple benchmarks. Experimental results indicate that CLIP-PCQA achieves superior performance and further analyses reveal the model s robustness under different settings. |
| Researcher Affiliation | Academia | Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China EMAIL |
| Pseudocode | No | The paper describes the proposed method using text, mathematical formulations, and diagrams, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/Olivialyt/CLIP-PCQA |
| Open Datasets | Yes | To illustrate the effectiveness of our method, we employ three benchmarks with available raw opinion scores: SJTU-PCQA (Yang et al. 2020a), LS-PCQA Part I (Liu et al. 2023b) and BASICS (Ak et al. 2024). |
| Dataset Splits | Yes | We partition the databases according to content (reference point clouds) and k-fold cross-validation is used for training. Specifically, 9-fold cross-validation is applied for SJTU-PCQA following (Zhang et al. 2022b), and we adopt a 5-fold cross-validation both for LS-PCQA and BASICS. For each fold, the test performance with minimal training loss is recorded and the average result across all folds is recorded to alleviate randomness. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models used for the experiments. It only mentions general training strategies. |
| Software Dependencies | No | The paper mentions using a 'Vision Transformer (Vi T-B/16)' and 'Adam optimizer', but does not provide specific version numbers for software libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | The initial learning rate is set as 4e-6 and the model is trained for 50 epochs with a default batch size of 16. We use the Adam optimizer (Kingma and Ba 2014) with a weight decay of 1e-4. The number of projection views M = 6 and the images are randomly cropped into 224 224 3 as inputs. We set the number of context tokens W as 16. For the loss function, we set θ = [0.25, 0.50, 0.75]. α is set to 1/K and β is set to 0.08. Depending on the raw score ranges of different databases, we evenly divide them into five thresholds as the quantitative values q. For example, we set q = [5, 4, 3, 2, 1] for LS-PCQA and q = [10, 8, 6, 4, 2] for SJTU-PCQA, respectively. |