Comprehensive Algorithm Portfolio Evaluation using Item Response Theory

Authors: Sevvandi Kandanaarachchi, Kate Smith-Miles

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test this framework on algorithm portfolios for a wide range of applications, demonstrating the broad applicability of this method as an insightful algorithm evaluation tool. Furthermore, the explainable nature of IRT parameters yield an increased understanding of algorithm portfolios.
Researcher Affiliation Academia Sevvandi Kandanaarachchi EMAIL CSIRO s Data61 Research Way, Clayton VIC 3168, Australia Kate Smith-Miles EMAIL School of Mathematics and Statistics University of Melbourne Parkville, VIC 3010, Australia
Pseudocode Yes Algorithm 1: AIRT framework. input : The matrix YN n, containing accuracy measures of n algorithms for N datasets/problem instances. output : 1. AIRT indicators of algorithms and dataset/problem difficulty 2. The strengths and weaknesses of algorithms 3. airt algorithm portfolio 4. Model goodness measures
Open Source Code Yes As a further contribution, we make this work available in the R package airt (Kandanaarachchi, 2020).
Open Datasets Yes In Section 5 we illustrate the complete functionality of AIRT including the algorithm metrics, problem space analysis, strengths and weaknesses of algorithms, algorithm portfolio evaluation and model goodness results using the detailed case study of Open ML-Weka classification algorithms and test instances available at ASlib repository (Bischl et al., 2016). We refer the reader to Appendix A where further results are summarized on nine more case studies using a variety of ASlib scenarios
Dataset Splits Yes For each algorithm scenario we use 10-fold cross validation and report the average cross validated performance gap for Shapley, topset and airt portfolios.
Hardware Specification No The paper does not provide specific hardware details used for running its experiments. It focuses on the methodology and datasets rather than computational environment specifications.
Software Dependencies Yes As a further contribution, we make this work available in the R package airt (Kandanaarachchi, 2020). The R package airt fits the continuous IRT models described in Section 2.2 using the updated log-likelihood function and assumption. To fit polytomous models airt uses the functionality of the existing R package mirt (Chalmers, 2012).
Experiment Setup Yes For all algorithms in the ASlib repository certain hyperparameters and parameters were used which we do not vary. Any conclusions we draw about algorithm performance are therefore dependent on the actual algorithm implementation they use.