reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multiple-Instance Learning from Distributions

Authors: Gary Doran, Soumya Ray

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform an extensive empirical evaluation that supports the theoretical predictions entailed by the new framework. The proposed theoretical framework leads to a better understanding of the relationship between the MI and standard supervised learning settings, and it provides new methods for learning from MI data that are more accurate, more eﬃcient, and have better understood theoretical properties than existing MI-speciﬁc algorithms.Our evaluation uses 55 data sets from a wide variety of domains, and supports both our theoretical results as well as the assumptions made by our generative model.
Researcher Affiliation	Academia	Gary Doran EMAIL Soumya Ray EMAIL Department of Electrical Engineering and Computer Science Case Western Reserve University 10900 Euclid Ave, Glennan 320 Cleveland, OH 44106, USA
Pseudocode	No	The paper contains numerous theorems, lemmas, and proofs, but no clearly labeled 'Pseudocode' or 'Algorithm' sections, nor any structured code-like blocks.
Open Source Code	No	We use the authors original MATLAB code, found at http://lamda.nju.edu.cn/code_KISVM.ashx, for the key instance SVM (KI-SVM) approach (Liu et al., 2012).
Open Datasets	Yes	To evaluate our hypothesis that a supervised SVM can perform well with respect to AUC for learning instanceand bag-labeling functions, we use a total of 55 real-world data sets across a variety of problem domains, including 3D-QSAR (Dietterich et al., 1997), CBIR (Andrews et al., 2003; Maron and Ratan, 1998; Rahmani et al., 2005), text categorization (Andrews et al., 2003; Settles et al., 2008), and audio classiﬁcation (Briggs et al., 2012).
Dataset Splits	Yes	We evaluate algorithms using 10-fold stratiﬁed cross-validation, with 5-fold inner-validation used to select parameters using random search (Bergstra and Bengio, 2012).
Hardware Specification	No	This work made use of the High Performance Computing Resource in the Core Facility for Advanced Research Computing at Case Western Reserve University.
Software Dependencies	No	The experiments used for this work were implemented in Python using Num Py (Ascher et al., 2001) and Sci Py (Jones et al., 2001) for general matrix computations and the CVXOPT library (Dahl and Vandenberghe, 2009) for solving quadratic programs (QPs).
Experiment Setup	Yes	Parameter selection is performed with respect to bag-level labels (since instance-level labels are unavailable at training time, even during cross-validation). We use the radial basis function (RBF) kernel with all algorithms, with scale parameter γ [10 6, 101], and regularization loss trade-oﬀ parameter C [10 2, 105]. The L2 norm is used for regularization in all algorithms.