reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Unbiased Generative Semi-Supervised Learning

Authors: Patrick Fox-Roberts, Edward Rosten

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We now examine the performance of the objective function given in Section 4 on real world data sets, compared to the standard semi-supervised learning, supervised learning, and several other alternative semi-supervised techniques. To maximally highlight the eﬀect of mismatch between the model and true distribution, a simple marginal distribution consisting of a single axis aligned Gaussian was chosen to model each class. The following six learning schemes were tested with this model: our unbiased semisupervised expression (SSunb), that is, the natural log of Equation (20); the log likelihood of the labelled data (LL), that is, Equation (1); the log likelihood of the standard (biased) semi-supervised expression (SSb), that is, the natural log of Equation (3); the log likelihood of the standard semi-supervised expression plus an Entropy Regularisation term (Grandvalet and Bengio, 2006) with the parameter λ set by 5 fold cross validation, selecting the λ with the lowest holdout set error rate (ERer); Entropy Regularisation as before, except cross validation is carried out on the log likelihood of the holdout set (ERnll); the semi-supervised equivalent of Multi Conditional learning (as investigated in Druck et al., 2007), again cross validating hyper parameters once on error rate (MCer) and once on log likelihood (MCnll); and the log likelihood of the standard semi-supervised expression plus an Expectation Regularisation (Mann and Mc Callum, 2007) term (XR), with the trade oﬀ parameter set (after some experimentation) as in the original paper to the equivalent of 10 times the number of labelled samples; Additionally, for the position parameter µ of each Gaussian a penalty term C\|\|µ\|\|2 was added onto each objective function with C set to a small constant ( 10 5). We would point out that many of these learning schemes were originally designed for use with a discriminative model. Here we are using them in a diﬀerent manner, to augment the objective function during the learning of a generative model. They have been selected due to their reported good performance in improving discriminative learning, in the hope that this will counteract the bias introduced by the missing class information in the likelihood of the unlabelled samples. We chose 7 data sets from the UCI repository (Frank and Asuncion, 2010); Diabetes, Wine, glass identiﬁcation (Glass), blood transfusion (Blood) (Yeh et al., 2009), Ecoli, Haberman survival (Haber), and Pima Indian diabetes (Pima); and 2 from libsvm: SVM guide 1 (SVMg) (Hsu et al., 2003) and fourclass (Four) (Ho and Kleinberg, 1996). Due to computational constraints, data sets with > 3 classes had one or more merged to create 3 approximately equally sized groupings. Each axis of the data was transformed to lie in the range [ 1, 1]. Samples with missing attributes were excluded. Where a data set had a dedicated test set, this was used; otherwise, one ﬁfth of the data was randomly separated a priori for this purpose. A range of values of NL and NU were trialled. As a proportion of the total available training data, NL varied from [0.025, 0.05, 0.1, 0.2], and NU from [0.025, 0.05, 0.1, 0.2, 0.4, 0.8], with NU being formed by discarding labels prior to training (for example, a test where NL = 0.05 and NU = 0.4 would indicate 0.45 of the available data was used for training, of which one ninth was labelled). For each repetition a random set of parameters was generated and used as the starting point for each of the above learning schemes. Each model was optimised by repeatedly alternating between a small number of iterations of downhill simplex search (Lagarias et al., 1998), followed by a large numbers of iterations of BFGS search (Nocedal and Wright, 1999), until convergence. This process was repeated 100 times for each combination of NL and NU values. The error rate and negative log likelihood of the test set was found for each solution. A selection of these results are shown here. Full results over all test sets are included in the appendix.
Researcher Affiliation	Collaboration	Patrick Fox-Roberts EMAIL Cambridge University Engineering Department Trumpington Street Cambridge, CB2 1PZ, UK Edward Rosten EMAIL Computer Vision Consulting 7th ﬂoor 14 Bonhill Street London, EC2A 4BX, UK
Pseudocode	No	The paper describes algorithms and derivations but does not present any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code or provide links to a code repository.
Open Datasets	Yes	We chose 7 data sets from the UCI repository (Frank and Asuncion, 2010); Diabetes, Wine, glass identiﬁcation (Glass), blood transfusion (Blood) (Yeh et al., 2009), Ecoli, Haberman survival (Haber), and Pima Indian diabetes (Pima); and 2 from libsvm: SVM guide 1 (SVMg) (Hsu et al., 2003) and fourclass (Four) (Ho and Kleinberg, 1996).
Dataset Splits	Yes	Where a data set had a dedicated test set, this was used; otherwise, one ﬁfth of the data was randomly separated a priori for this purpose. A range of values of NL and NU were trialled. As a proportion of the total available training data, NL varied from [0.025, 0.05, 0.1, 0.2], and NU from [0.025, 0.05, 0.1, 0.2, 0.4, 0.8], with NU being formed by discarding labels prior to training (for example, a test where NL = 0.05 and NU = 0.4 would indicate 0.45 of the available data was used for training, of which one ninth was labelled).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory amounts) used for running experiments. It mentions computational constraints but no specifics.
Software Dependencies	No	The paper mentions optimization methods like 'downhill simplex search (Lagarias et al., 1998)' and 'BFGS search (Nocedal and Wright, 1999)' which are algorithms, but does not specify any software libraries or their version numbers used for implementation.
Experiment Setup	Yes	Additionally, for the position parameter µ of each Gaussian a penalty term C\|\|µ\|\|2 was added onto each objective function with C set to a small constant ( 10 5). For each repetition a random set of parameters was generated and used as the starting point for each of the above learning schemes. Each model was optimised by repeatedly alternating between a small number of iterations of downhill simplex search (Lagarias et al., 1998), followed by a large numbers of iterations of BFGS search (Nocedal and Wright, 1999), until convergence. This process was repeated 100 times for each combination of NL and NU values.