reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multilabel Classification with Partial Abstention: Bayes-Optimal Prediction under Label Independence

Authors: Vu-Linh Nguyen, Eyke Hüllermeier

JAIR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentally, we show MLC with partial abstention to be eﬀective in the sense of reducing loss when being allowed to abstain. In this section, we present an empirical analysis that is meant to show the eﬀectiveness of our approach to prediction with abstention. To this end, we perform experiments on a set of standard benchmark data sets from the MULAN repository (cf. Table 3), following a standard 10-fold cross-validation procedure.
Researcher Affiliation	Academia	Vu-Linh Nguyen EMAIL Department of Mathematics and Computer Science Eindhoven University of Technology, The Netherlands Eyke Hüllermeier EMAIL Department of Computer Science University of Munich (LMU), Germany
Pseudocode	Yes	Algorithm 1 BOP of the generalized rank loss Algorithm 2 Determining a BOP of the generalized measure 퐹훽 Algorithm 3 Determining a BOP of the generalized measure 퐹Jac
Open Source Code	No	The paper provides links to third-party implementations used (e.g., http://scikit.ml/api/skmultilearn.base.problem_transformation.html for BR, and https://github.com/julilien/MLCAmazon From Space for EVGG16), but not for the authors' specific methodology or code developed for this paper.
Open Datasets	Yes	To this end, we perform experiments on a set of standard benchmark data sets from the MULAN repository (cf. Table 3), following a standard 10-fold crossvalidation procedure.2 http://mulan.sourceforge.net/datasets.html. The data set consists of 2, 000 natural scene images, where the class labels are desert, mountains, sea, sunset and trees3. 3 https://www.lamda.nju.edu.cn/data_MIMLimage.ashx?Aspx Auto Detect Cookie Support=1.
Dataset Splits	Yes	To this end, we perform experiments on a set of standard benchmark data sets from the MULAN repository2 (cf. Table 3), following a standard 10-fold crossvalidation procedure. Binary relevance (BR) learning and ensembles of classiﬁer chains (ECC) are employed as the MLC classiﬁers on these data sets. In addition, we perform experiments on an image data set, on which we train ensembles of convolutional neural networks. The data set consists of 2, 000 natural scene images, where the class labels are desert, mountains, sea, sunset and trees3. Because training deep networks is much more complex, we conduct a 3-fold (instead of a 10-fold) cross-validation procedure.
Hardware Specification	No	The paper mentions using 'VGG16-based convolutional neural network classiﬁers', but does not specify any particular hardware (GPU/CPU model, memory, etc.) used for training or inference.
Software Dependencies	No	The paper mentions several software tools and libraries like 'sklearn' and 'scikit.ml' but does not provide specific version numbers for any of them. For example: 'logistic regression (in its default setting in sklearn, i.e., with regularisation parameter set to 1)'.
Experiment Setup	Yes	We perform experiments with two variants of BR, namely with logistic regression (in its default setting in sklearn, i.e., with regularisation parameter set to 1) as the base learner (BR+LR) and with support vector machines, using Platt-scaling (Lin et al., 2007; Platt, 1999) to turn scores into probabilities, as a base learner (BR+SVM). The cardinality 푀of the ECCs is set to 50. The network consists of a VGG16-based encoder (pretrained on Image Net) whose layers are freezed, except the last convolutional block. Moreover, we add a fully connected classiﬁcation layer head including a 128-neuron hidden dense layer and the 5-neuron classiﬁcation head. The networks are trained using SGD with Nesterov momentum (lr = 10-6, momentum = 0.9) for 100 epochs. The loss is binary cross-entropy for multi-label classiﬁcation. The input images are resized to 128 128 3.