Navigating Conflicting Views: Harnessing Trust for Learning

Authors: Jueqing Lu, Wray Buntine, Yuanyuan Qi, Joanna Dipnall, Belinda Gabbe, Lan Du

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on six real-world datasets using Top-1 Accuracy, Fleiss Kappa, and a new metric, Multi-View Agreement with Ground Truth, to assess prediction reliability. We also assess the effectiveness of uncertainty in indicating prediction correctness via AUROC. Additionally, we test the scalability of our method through end-to-end training on a largescale dataset. The experimental results show that computational trust can effectively resolve conflicts, paving the way for more reliable multi-view classification models in real-world applications.
Researcher Affiliation Academia 1Department of Data Science & AI, Monash University 2College of Engineering and Computer Science, Vin University 3School of Public Health and Preventive Medicine, Monash University. Correspondence to: Lan Du <EMAIL>.
Pseudocode Yes Algorithm 1 Algorithm For Training (simplified version) Algorithm 2 Algorithm For Training Algorithm 3 Algorithm For Testing
Open Source Code Yes Codes available at: https://github.com/OverfitFlow/Trust4Conflict
Open Datasets Yes Following previous work (Han et al., 2021; 2022; Jung et al., 2022; Xu et al., 2024a), we conducted experiments on six benchmark datasets: Handwritten7, Caltech101 (Fei-Fei et al., 2004), PIE 8, Scene15 (Fei-Fei & Perona, 2005), HMDB (Kuehne et al., 2011) and CUB (Wah et al., 2011) with train-test split of 80% vs. 20%. 7https://archive.ics.uci.edu/ml/datasets/ Multiple+Features 8http://www.cs.cmu.edu/afs/cs/project/ PIE/Multi Pie/Multi-Pie/Home.html
Dataset Splits Yes Following previous work (Han et al., 2021; 2022; Jung et al., 2022; Xu et al., 2024a), we conducted experiments on six benchmark datasets: Handwritten7, Caltech101 (Fei-Fei et al., 2004), PIE 8, Scene15 (Fei-Fei & Perona, 2005), HMDB (Kuehne et al., 2011) and CUB (Wah et al., 2011) with train-test split of 80% vs. 20%. Table 11. Summary of Datasets Dataset Size K Dimensions #Train #Test Hand Written 2000 10 240/76/216/47/64/6 1600 400
Hardware Specification Yes All methods were run on a single 24GB RTX3090 card for fair comparison. All Experiments are conducted on a single Nvidia RTX 3090 GPU with 24GB of memory.
Software Dependencies Yes Specifically, we used Py Torch (Paszke et al., 2019) version 1.13.0, built with CUDA 11.7, to implement our codes. The Python environment version is 3.8, and the operating system is Ubuntu 22.04.4.
Experiment Setup Yes Table 10. TF and ETF hyper-parameters Hyper-parameter Handwritten Caltech101 PIE Scene15 HMDB CUB lr 3e-3 1e-4 3e-3 1e-2 3e-4 1e-3 rlr 3e-4 3e-5 1e-3 3e-3 1e-4 3e-4 weight-decay 1e-4 1e-4 1e-4 1e-4 1e-4 1e-4 warm-up epochs 1 1 1 1 1 1 The Adam optimizer (Kingma & Ba, 2015) is used for updating model parameters with beta coefficients = (0.9, 0.999) and epsilon = 1e-8.