reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Accelerated Deep Active Learning with Graph-based Sub- Sampling

Authors: Dan Kushnir, Shiyun Xu

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments validate our goal to reduce the query time while maintaining the highest accuracy. We report DGAL s improved query time vs. accuracy trade-oﬀand compare it with pivotal SOTA baselines. Additionally, we provide classical active learning empirical analysis of the trade-oﬀbetween the number of queried data points and accuracy, and we provide average query times, for all baselines and data sets. We provide an ablation study with VAE-SEALS (see Algorithm 3) and with random versions of pool set restriction. Training details are provided in table 1 in the appendix.
Researcher Affiliation	Collaboration	Dan Kushnir EMAIL Bell Laboratories NOKIA Shiyun Xu EMAIL Department of Applied Mathematics and Computational Science University of Pennsylvania
Pseudocode	Yes	We provide the pseudo-code of our method in Algorithm 2. We note that the input to VAE-DGAL includes the labeled and unlabeled pool set, the VAE architecture, and the task classiﬁcation network f .
Open Source Code	No	The paper does not provide a direct link to a source-code repository for the methodology described in this paper, nor does it include an explicit statement about releasing the code for this work.
Open Datasets	Yes	We experimented with benchmark data sets MNIST Le Cun et al. (1998), EMNIST Cohen et al. (2017), SVHN Netzer et al. (2011), CIFAR10 Krizhevsky et al. (2009), CIFAR100 Krizhevsky et al. (2009), and Mini-Image Net Ravi & Larochelle (2017) data sets. The Image Net Deng et al. (2009) is a well-known large-scale dataset in computer vision.
Dataset Splits	No	The paper mentions using well-known datasets like MNIST, CIFAR10, and ImageNet, but it does not explicitly provide the specific training/test/validation splits (e.g., percentages or sample counts) used for the experiments in the main text. It refers to 'Training details are provided in table 1 in the appendix,' implying such details are not in the main body.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	Additional parameters: The batch size B is set at no more than 500 (see batch size for diﬀusion algorithm in Kushnir & Venturi (2023). The number of epochs is set with a stopping criterion for the convergence of the loss function.