reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

New Learning Methods for Supervised and Unsupervised Preference Aggregation

Authors: Maksims N. Volkovs, Richard S. Zemel

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate the models on rank aggregation and collaborative ﬁltering data sets and demonstrate superior empirical accuracy.
Researcher Affiliation	Academia	Maksims N. Volkovs EMAIL Richard S. Zemel EMAIL University of Toronto 40 St. George Street Toronto, ON M5S 2E4
Pseudocode	Yes	Algorithm 1 Feature-Based Learning Algorithm Algorithm 2 CRF Learning Algorithm
Open Source Code	Yes	The code for all models introduced in this paper is available at www.cs.toronto.edu/~mvolkovs.
Open Datasets	Yes	For rank aggregation problem we use the LETOR (Liu et al., 2007a) benchmark data sets. For collaborative ﬁltering experiments we used the Movie Lens data set (Herlocker et al., 1999).
Dataset Splits	Yes	Each data set comes with ﬁve precomputed folds with 60/20/20 splits for training/validation/testing.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments. It discusses runtime comparisons between methods but does not specify the underlying hardware.
Software Dependencies	No	The paper mentions specific methods like Lambda Rank, but it does not specify software dependencies with version numbers (e.g., Python version, library versions like scikit-learn, TensorFlow, PyTorch, etc.).
Experiment Setup	Yes	For all models we found that 100 steps of gradient descent were enough to obtain the optimal results. To avoid constrained optimization we reparametrized the variance parameters as γni = exp(βni) and optimized βni instead. This reparametrization was done for all the reported experiments. Throughout all experiments we used samples from a Gaussian with mean 0 and standard deviation of 0.01 to initialize the parameters and found that the diﬀerence in results across multiple restarts was negligible. For the SVD-based model we found through cross-validation that setting p = 1 (SVD rank) gave the best performance which is expected considering the sparsity level of the pairwise matrices. The Lambda Rank training of the scoring function was run for 200 iterations with a learning rate of 0.01, and validation NDCG@10 was used to choose the best model. For the CRF model we used expected NDCG (see Equation 5) as the target objective and set ϵ = 6 ensuring that at least one document of every relevance label was chosen each time.