reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Challenge design roadmap

Authors: Hugo Jair Escalante, Isabelle Guyon, Addison Howard, Walter Reade, Sebastien Treguer

DMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This document serves as a comprehensive guide for designing and organizing effective challenges, particularly within the domains of machine learning and artificial intelligence. It provides detailed guidelines on every phase of the process, from conception and execution to post-challenge analysis.
Researcher Affiliation	Collaboration	Hugo Jair Escalante EMAIL Instituto Nacional de Astrofísica, Óptica y Electrónica, Mexico Isabelle Guyon EMAIL University Paris-Saclay, France, Cha Learn, USA, and Google, USA Addison Howard EMAIL KAGGLE, USA Walter Reade EMAIL KAGGLE, USA Sébastien Treguer EMAIL INRIA, France
Pseudocode	No	The paper describes methods and protocols in natural language and using tables and diagrams (e.g., Table 2: Hierarchy of challenge protocols, Figure 1: Challenge design principal pillars, Figure 1 (Appendix A): Illustration of 5-way 1-shot classification), but does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	This document is a 'Challenge design roadmap' providing guidelines for organizing challenges and does not present a specific methodology or system that would have associated source code. While it discusses promoting open-source collaboration for challenges and includes a GitHub link for a dataset within an example challenge proposal, it does not offer open-source code for the roadmap methodology described in this paper.
Open Datasets	Yes	The paper's Appendix A, which contains an example challenge proposal, states: 'Our competition will use a meta-dataset called Mata-Album2, prepared in parallel with the competition, to be released after the competition ends. Several team members of the competition are also co-authors of this meta-dataset. It consists of 10 domains with three image mother datasets per domain. Although these datasets are publicly available, we selected for test purposes dataset that are not part of past meta-learning benchmarks.' and references a GitHub repository: 'Meta-Album: https://github.com/ihsaan-ullah/meta-album'. It also adds: 'They will be released on Open ML [14] after the competition ends.'
Dataset Splits	Yes	The example challenge proposal in Appendix A details the dataset splits across different phases: 'The competition has 3 phases: (1) public phase (release of starting kit and 10 public datasets), (2) feedback phase (participants submit their code to get instant feedback on 10 hidden datasets), and (3) final phase (the last submission of each participant from feedback phase evaluated on 10 new hidden datasets). ... The 10 first are the freshest and will be used for the final test phase; the 10 in the middle will be used in the feedback phase; the last 10 datasets will be released in the public phase.'
Hardware Specification	Yes	In Appendix A, under 'Resources provided by organizers', the paper specifies: 'To process participant submissions, we will supply 20 compute workers dedicated to the competition. Each compute worker will be equipped with one GPU NVIDIA RTX 2080Ti, 4 v CPUs and 16 GB DDR4 RAM.' It also mentions: 'We received a Google grant of 100,000 credits, equivalent to approximately 91575 GPU hours on a Tesla M60 GPU.'
Software Dependencies	No	The paper mentions general platforms and tools like 'Coda Lab' and 'Git Hub repository' within the context of organizing a challenge and its example proposal. It also refers to 'installing the required packages' for the starting kit, but it does not specify any particular software libraries or dependencies along with their version numbers that would be needed to replicate any experimental results or the challenge setup itself.
Experiment Setup	Yes	The example challenge proposal in Appendix A details several experimental setup parameters. Specifically, under 'Tasks and application scenarios', it states: 'However, during meta-validation and meta-testing, the number of ways will range from 2 to 20 with a number of shots ranging from 1 to 20, i.e., during meta-validation and meta-testing, the tasks will be any-way any-shot.' Additionally, in 'Protocol', it mentions a system-level constraint: 'all the submissions will be restricted by 10 GPU-hours of execution'.