reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Minimum Density Hyperplanes

Authors: Nicos G. Pavlidis, David P. Hofmeyr, Sotiris K. Tasoulis

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate its performance on a range of benchmark data sets. The proposed approach is found to be very competitive with state of the art methods for clustering and semi-supervised classiﬁcation. Experimental results are presented in Section 5
Researcher Affiliation	Academia	Nicos G. Pavlidis EMAIL Department of Management Science Lancaster University Lancaster, LA1 4YX, UK; David P. Hofmeyr EMAIL Department of Mathematics and Statistics Lancaster University Lancaster, LA1 4YF, UK; Sotiris K. Tasoulis EMAIL Department of Applied Mathematics Liverpool John Moores University, Liverpool, L3 3AF, UK
Pseudocode	No	The paper describes algorithms and mathematical formulations but does not include a clearly labeled pseudocode block or algorithm section with structured steps.
Open Source Code	Yes	The underlying code and data are openly available from Lancaster University data repository at http://dx.doi.org/10.17635/lancaster/researchdata/97.
Open Datasets	Yes	Details of benchmark data sets: size (n), dimensionality (d), number of clusters (c). a. UCI machine learning repository https://archive.ics.uci.edu/ml/datasets.html
Dataset Splits	Yes	For each value of ℓ, 30 random partitions into labelled and unlabelled data are considered. As classes are balanced in the data sets considered, performance is measured only in terms of classiﬁcation error on the unlabelled data. For data sets with more than two classes all pairwise combinations of classes are considered and aggregate performance is reported.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used (e.g., GPU/CPU models, memory details) for running its experiments.
Software Dependencies	No	The paper discusses various algorithms and methods (e.g., k-means++, LDA-km, i SVR-L and i SVR-G, Normalised cut spectral clustering (SCn), Laplacian Regularised Support Vector Machines (Lap SVM), Simple Semi-Supervised Learning (SSSL), Correlated Nyström Views (XNV)), but does not provide specific version numbers for any software libraries, programming languages, or tools used in the implementation.
Experiment Setup	Yes	In all experiments we set the bandwidth parameter to h = 0.9ˆσpc1n 1/5, where ˆσpc1 is the estimated standard deviation of the data projected onto the ﬁrst principal component. This bandwidth selection rule is recommended when the density being approximated is assumed to be multimodal (Silverman, 1986). The parameter η controls the distance between the minimisers of arg minb R f CL(v, b) and arg minb F(v) ˆI(v, b), while larger values of ϵ increase the smoothness of the penalised function f CL. Values of η close to zero aﬀect the numerical stability of the one-dimensional optimisation problem, due to the term L ηϵ in f CL becoming very large. We used η = 10 2 and ϵ = 1 10 6 to avoid numerical instability. The penalty parameter γ is ﬁrst set to 0.1 and with this setting α is progressively increased in the same way as for clustering. After this, α is kept at αmax and γ is increased to 1 and then 10.