reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Calibrated Disambiguation for Partial Multi-label Learning

Authors: Zhuoming Li, Yuheng Jia, Mi Yu, Zicong Miao

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments conducted on Pascal VOC, MS-COCO, NUS-WIDE and CUB have verified that our method outperforms existing state-of-the-art PML methods.
Researcher Affiliation	Collaboration	Zhuoming Li 1,2, Yuheng Jia 1,2*, Mi Yu 2, Zicong Miao 2 1 School of Computer Science and Engineering, Southeast University 2 China Telecom Cloud Computing Corporation, Beijing 100088, China
Pseudocode	Yes	Algorithm 1: Algorithm of PML-CD Input: partial multi-label learning dataset {(xi, yi)}, Parameter: pre-trained calibrator g, learning rate η Output: network f( \| θ) for PML 1: repeat 2: Update network parameters θ θ η Lwarmup 3: until early stopping 4: Initialize wij 1 5: repeat 6: Collect predicted confidences {tij} f({xi}) 7: Compute q P via Equation. 5 8: Predict reliability histogram r via Equation. 6 9: Update curriculum weights wij = P(yij = yij \| tij) via Equation. 7, if yij = 1 10: Update network parameters θ θ η Ldisam 11: until early stopping
Open Source Code	Yes	Code https://github.com/lee-plus-plus/PML-CD
Open Datasets	Yes	Datasets. To validate our proposed approach, we conducted experiments on several commonly used real-world multi-label image datasets, including MS-COCO 2014 (Lin et al. 2014), Pascal VOC 2007 (Everingham et al. 2010), NUSWIDE (Chua et al. 2009), and CUB 200 (Wah et al. 2011).
Dataset Splits	No	The paper mentions using standard datasets like MS-COCO 2014, Pascal VOC 2007, NUS-WIDE, and CUB 200 but does not explicitly provide details about how these datasets were split into training, validation, and test sets for the experiments.
Hardware Specification	Yes	Our experiments are run on Ge Force RTX 4090 with Py Torch 1.13.1.
Software Dependencies	Yes	Our experiments are run on Ge Force RTX 4090 with Py Torch 1.13.1.
Experiment Setup	Yes	During the training, all images are resized to 224 224, and strong data augmentation are applied for training set, including horizontal flip, Rand Augment (Cubuk et al. 2020) and Cutout. Same data processing are adopted for almost all comparison methods. We train the model with the following optimization setting: an Adam Optimizer with fixed learning rate 1 10 4 for VOC and CUB, 1 10 5 for COCO and NUS-WIDE; a weight decay of 5 10 5. The size of mini-batch is set to be 32. Early stopping is applied for all comparison methods, since noisy labels learning is highly dependent on it (Bai et al. 2021).