reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Single-positive Multi-label Learning with Label Cardinality

Authors: Shayan Gharib, Pierre-Alexandre Murena, Arto Klami

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate the value of the cardinality side information using standard SPMLL benchmark tasks. Already the first component that only requires crude estimates for mean and maximum cardinality achieves mean average precision (MAP) comparable to several recent SPMLL methods (Cole et al., 2021; Zhou et al., 2022; Kim et al., 2023; Chen et al., 2024). With richer cardinality information, we can obviously improve the results, and our focus is on quantifying the possible gain and studying the relationship between the accuracy of the cardinality estimates and the final classification performance. For example, for the commonly used NUS-WIDE data (Chua et al., 2009), knowing the cardinality instance is sufficient for almost matching the accuracy of a model trained on fully labeled data, and we get close even when only assuming the cardinality distribution.
Researcher Affiliation	Academia	Shayan Gharib EMAIL Department of Computer Science University of Helsinki; Pierre-Alexandre Murena EMAIL Human-Centric Machine Learning Research Group Hamburg University of Technology; Arto Klami EMAIL Department of Computer Science University of Helsinki
Pseudocode	Yes	We introduce a novel algorithm for estimating the instance cardinalities, requiring only knowing the cardinality distribution, a probability vector of P(k). The method is computationally light and does not require any training or other supervision; in particular, we do not need to know the true ki for any instance. We formulate the problem as perfect bipartite matching between two sets of N elements (Cormen et al., 2022). ... For this problem, we can find the global optimum in O(N log N) time with an algorithm that sequentially matches the candidates (see Appendix A for a proof): Iteratively assign the smallest cardinality for the sample with the smallest si and remove the corresponding elements from the sets. Since v is ordered by construction, this can be done by sorting si and collecting the sorting order in Π.
Open Source Code	Yes	Our implementation is publicly available at https://github.com/shayangharib/SPMLL_with_ Label_Cardinality.
Open Datasets	Yes	Data We use four common SPMLL benchmark datasets for evaluation: Pascal VOC 2012 (VOC) (Everingham et al., 2015), MS COCO 2014 (COCO) (Lin et al., 2014), NUS-Wide (NUS) (Chua et al., 2009), and CUB-200-2011 (CUB) (Wah et al., 2011).
Dataset Splits	Yes	The test setup follows closely previous SPMLL works, e.g. Cole et al. (2021); Zhou et al. (2022). We randomly split each original training data into our training and validation sets, using 20% for validation. The original validation data is used as the test set. The SPMLL training data is formed by choosing the observed label uniformly at random from the set of true positive labels for each instance, assumed also by the comparison methods. That is, we do not consider the more general setups with e.g. class-specific weighting studied in the broader positive and unlabeled literature (Elkan & Noto, 2008). The validation and test sets are assumed fully labeled, as in previous works, to enable a fair comparison against the baselines and reduce random variation; see Section 6 for discussion. Table 3 provides full details about the datasets, indicating both the standard information (number of images and C) and the cardinality statistics ke and kmax.
Hardware Specification	No	The authors acknowledge CSC IT Center for Science, Finland, for computational resources.
Software Dependencies	No	We use a Res Net-50 backbone pre-trained on Image Net. Following the same image processing techniques as in all comparison methods, we resize all input images to 448 448 pixels and apply data augmentation of random horizontal flipping (probability 0.5) during training. The output layer of Res Net-50 is replaced by a global average pooling (Lin, 2013) followed by a fully connected layer, and the output dimension is adjusted to match the number of classes in the target dataset. The paper mentions software components like Res Net-50, ImageNet, and Adam optimizer, but does not provide specific version numbers for any software libraries or environments used for implementation (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup	Yes	For all the methods, the hyperparameters are determined with a grid search using validation MAP as the metric, as detailed in Appendix C.4. For simplicity, we use fixed batch sizes from Zhou et al. (2022) and learning rate from Chen et al. (2024) for all the methods, focusing on validating the method-specific parameters. For CS and CS CD, we train our model for 10 epochs and 20 epochs for CS IC, using Adam optimizer and established training parameters from prior work without further optimization. Following Zhou et al. (2022), we use the batch size of 8, 16, 16, and 8 for VOC, COCO, NUS, and CUB respectively. Consistent with Chen et al. (2024), we fix the learning rates to 1e 5 for VOC, COCO, and NUS, and to 5e 5 for CUB. Table 4: Best hyperparameters for each method. The parameters α and β used in Equation 4 and 5 are computed based on γ, η and ϕ using Equation 6 and 7.