Comprehensive Algorithm Portfolio Evaluation using Item Response Theory
Authors: Sevvandi Kandanaarachchi, Kate Smith-Miles
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test this framework on algorithm portfolios for a wide range of applications, demonstrating the broad applicability of this method as an insightful algorithm evaluation tool. Furthermore, the explainable nature of IRT parameters yield an increased understanding of algorithm portfolios. |
| Researcher Affiliation | Academia | Sevvandi Kandanaarachchi EMAIL CSIRO s Data61 Research Way, Clayton VIC 3168, Australia Kate Smith-Miles EMAIL School of Mathematics and Statistics University of Melbourne Parkville, VIC 3010, Australia |
| Pseudocode | Yes | Algorithm 1: AIRT framework. input : The matrix YN n, containing accuracy measures of n algorithms for N datasets/problem instances. output : 1. AIRT indicators of algorithms and dataset/problem difficulty 2. The strengths and weaknesses of algorithms 3. airt algorithm portfolio 4. Model goodness measures |
| Open Source Code | Yes | As a further contribution, we make this work available in the R package airt (Kandanaarachchi, 2020). |
| Open Datasets | Yes | In Section 5 we illustrate the complete functionality of AIRT including the algorithm metrics, problem space analysis, strengths and weaknesses of algorithms, algorithm portfolio evaluation and model goodness results using the detailed case study of Open ML-Weka classification algorithms and test instances available at ASlib repository (Bischl et al., 2016). We refer the reader to Appendix A where further results are summarized on nine more case studies using a variety of ASlib scenarios |
| Dataset Splits | Yes | For each algorithm scenario we use 10-fold cross validation and report the average cross validated performance gap for Shapley, topset and airt portfolios. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. It focuses on the methodology and datasets rather than computational environment specifications. |
| Software Dependencies | Yes | As a further contribution, we make this work available in the R package airt (Kandanaarachchi, 2020). The R package airt fits the continuous IRT models described in Section 2.2 using the updated log-likelihood function and assumption. To fit polytomous models airt uses the functionality of the existing R package mirt (Chalmers, 2012). |
| Experiment Setup | Yes | For all algorithms in the ASlib repository certain hyperparameters and parameters were used which we do not vary. Any conclusions we draw about algorithm performance are therefore dependent on the actual algorithm implementation they use. |