Probabilistic Matrix Factorization for Automated Machine Learning

Authors: Nicolo Fusi, Rishit Sheth, Melih Elibol

NeurIPS 2018 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we show that our approach quickly identifies high-performing pipelines across a wide range of datasets, significantly outperforming the current state-of-the-art.
Researcher Affiliation Collaboration Nicolo Fusi, Rishit Sheth Microsoft Research, New England EMAIL Melih Elibol EECS, University of California, Berkeley EMAIL
Pseudocode No The paper describes methods and equations, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Data and software available at https://github.com/rsheth80/pmf-automl/
Open Datasets Yes We ran all of the experiments on 553 Open ML [28] datasets
Dataset Splits Yes We generated training data for our method by splitting each Open ML dataset in 80% training data, 10% validation data and 10% test data
Hardware Specification No The paper mentions 'approximately 3 hours on a 16-core Azure machine', but does not specify exact CPU models, GPU models, or memory details.
Software Dependencies No The paper mentions software like 'scikit-learn [17]' and 'auto-sklearn library [4]' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We set the number of latent dimensions to Q = 20, stochastic gradient descent learning rate to η = 1e 7, and (column) batch-size to 50. The latent space was initialized using PCA, and training was run for 300 epochs (corresponding to approximately 3 hours on a 16-core Azure machine). Finally, we configured the acquisition function with ξ = 0.012.