reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimal and Efficient Binary Questioning for Accelerated Annotation

Authors: Franco Marchesoni-Acland, Jean-Michel Morel, Josselin Kherroubi, Gabriele Facciolo

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	evaluated using several synthetic and realworld datasets. The method allows a significant improvement (23 86%) in the annotation efficiency of real-world datasets. Experiments Here, we summarize our main results. The Appendix includes full numerical results, experimental details, and extra figures. Synthetic: We generate three two-dimensional toy problem variations with dataset size N = 10 for 1000 different seeds. Real With Pretrained f: The IA method is evaluated over the first 6000 samples of the CIFAR10 (Krizhevsky, Hinton et al. 2009) and SVHN (Netzer et al. 2011) datasets. Real With Online f: Then, we turn our attention to the more realistic case of annotating when a trained predictor is not available. In this case, the predictor is trained from scratch and online using the labels as they become available. This is the case (dubbed ALIA) that combines the related tasks of active learning (training a predictor that selects samples to label) and quick annotation (using a predictor to annotate more efficiently) into human-in-the-loop learning with Q&A as the interaction paradigm. This ALIA process yields both annotated data and a trained predictor. For this experiment, we run ALIA over MNIST (Le Cun et al. 1998) and Fashion MNIST (Xiao, Rasul, and Vollgraf 2017) (see Figure 5).
Researcher Affiliation	Collaboration	Franco Marchesoni-Acland1,3, Jean-Michel Morel2, Josselin Kherroubi1, Gabriele Facciolo3 1SLB, AI Lab, Paris 2City University of Hong Kong 3Universit e Paris-Saclay, ENS Paris-Saclay, CNRS, Centre Borelli, France
Pseudocode	Yes	We provide the algorithm in the Appendix.
Open Source Code	Yes	We refer the reader to the supplementary material for the full code and algorithm details.
Open Datasets	Yes	Real With Pretrained f: The IA method is evaluated over the first 6000 samples of the CIFAR10 (Krizhevsky, Hinton et al. 2009) and SVHN (Netzer et al. 2011) datasets... For this experiment, we run ALIA over MNIST (Le Cun et al. 1998) and Fashion MNIST (Xiao, Rasul, and Vollgraf 2017) (see Figure 5).
Dataset Splits	No	The paper mentions evaluating
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	No	The paper mentions generating synthetic data with N=10 for 1000 different seeds, and evaluating on the first 6000 examples of real datasets. It also states predictors are either 'pretrained' or 'trained from scratch and online'. However, specific hyperparameters such as learning rates, batch sizes, number of epochs, or optimizer settings are not provided in the main text.