reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On Multilabel Classification and Ranking with Bandit Feedback

Authors: Claudio Gentile, Francesco Orabona

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Though the emphasis is on theoretical results, we also validate our algorithms on real-world multilabel data sets under several experimental conditions: data set size, label set size, loss functions, training mode and performance (online vs. batch), label generation model (linear vs. logistic). Under all such conditions, our algorithms are contrasted against the corresponding multilabel/ranking baselines that operate with full information, often showing (surprisingly enough) comparable prediction performance.
Researcher Affiliation	Academia	Claudio Gentile EMAIL Di STA, Universit a dell Insubria via Mazzini 5 21100 Varese, Italy Francesco Orabona EMAIL Toyota Technological Institute at Chicago 6045 South Kenwood Avenue 60637 Chicago, IL, USA
Pseudocode	Yes	Figure 1: The partial feedback algorithm in the (ordered) multiple label setting the linear model case. Figure 2: The partial feedback algorithm in the (ordered) multiple label setting the generalized linear model case.
Open Source Code	No	The paper does not provide explicit statements about open-sourcing the code, nor does it include links to a code repository.
Open Datasets	Yes	We used three diverse multilabel data sets, intended to represent diﬀerent real-world conditions. The ﬁrst one, called Mediamill, was introduced in a video annotation challenge (Snoek et al., 2006). [...] The second data set is the music annotated Sony CSL Paris data set (Pachet and Roy, 2009), [...] The third one is the smaller Yeast data set (Elisseeﬀand Weston, 2002).
Dataset Splits	Yes	The ﬁrst one, called Mediamill, was introduced in a video annotation challenge (Snoek et al., 2006). It comprises 30, 993 training samples and 12, 914 test ones. [...] The second data set is the music annotated Sony CSL Paris data set (Pachet and Roy, 2009), made up of 16, 452 training samples and 16,519 test samples, [...] The third one is the smaller Yeast data set (Elisseeﬀand Weston, 2002), made up of 1, 500 training samples, 917 test samples, with d = 103 and K = 14.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies	No	The paper does not mention specific software names with version numbers required to replicate the experiment.
Experiment Setup	Yes	For the practical implementation of the algorithm in Figure 2, we simpliﬁed the formula for ϵ2 i,t. [...] where α is a parameter that we found by cross-validation on each data set across the range α = 2 8, 2 7, . . . , 27, 28, for each choice of the label-generation model, loss setting, and value of S see below. We have considered two diﬀerent loss functions L, the square loss and the logistic loss (denoted by Log Loss in our plots). [...] In the logistic case, it makes sense in practice not to place any restrictions on the margin domain D, so that we set R = . Again, because our upper bounding analysis would yield as a consequence c L = 0, we instead set c L to a small positive constant, speciﬁcally c L = 0.1, with no special attention to its ﬁne-tuning. The setting of the cost function c(i, s) depends on the task at hand, and we decided to evaluate two possible settings. The ﬁrst one, denoted by decreasing is c(i, s) = s i+1 s , i = 1, . . . , s, the second one, denoted by constant , is c(i, s) = 1, for all i and s. In all experiments with ℓa,c, the a parameter was set to 0.5 [...] We did so by imposing, for all t, an upper bound St = S on \| ˆYt\|. For each of the three data sets, we tried out the four diﬀerent values of S reported in the last four columns of Table 1