reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Adaptive Data Collection for Robust Learning Across Multiple Distributions

Authors: Chengbo Zang, Mehmet Kerem Turkcan, Gil Zussman, Zoran Kostic, Javad Ghaderi

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluations on standard datasets and a real-world testbed for object detection in smart-city intersections validate the consistent performance improvements of our method compared to baselines such as random sampling and various active learning methods. In this section, we present the experimental results of the three algorithms described in Section 3, compared to other state-of-the-art AL (active learning) algorithms.
Researcher Affiliation	Academia	1Department of Electrical Engineering, Columbia University, New York NY, USA 2Department of Civil Engineering, Columbia University, New York NY, USA. Correspondence to: Chengbo Zang <EMAIL>, Javad Ghaderi <EMAIL>.
Pseudocode	Yes	Algorithm 1 General Framework of Online Optimization with Adaptive Data Collection Require: Total training rounds T, batch size M, randomly initialized θ1 1: X0 2: for t = 1, 2, . . . , T do 3: kt SELECT(θt, Xt 1) 4: Bt {X1, . . . , XM Dkt} 5: Xt Xt 1 S Bt 6: θt+1 UPDATE(θt, Xt, kt) 7: end for
Open Source Code	No	The paper does not provide a specific link to source code or explicitly state that the code is being released in supplementary materials or upon publication.
Open Datasets	Yes	We perform image classification on the CIFAR10 dataset (Krizhevsky et al., 2009) with a budget of 10,000 images, where every class is a data source. We also report the results on the MNIST dataset (Lecun et al., 1998) to test different optimizer configurations and get more insight into the distribution of collected samples from different classes under different algorithms. We perform object detection on the PASCAL VOC2012 dataset (Everingham et al.) with a budget of 3,000 images. We perform a simple Visual Question Answering (VQA) task under a budget of 1,000 question-answer pairs from the VQAv2 dataset (Antol et al., 2015).
Dataset Splits	Yes	Each algorithm executes 1,000 rounds and collects a batch of 8 samples every 4 rounds under a total budget of 2,000 training images. All AL algorithms are given an initial labeled pool of 1,000 samples (10% of budget), and proceeds to collect 3,000 samples in each episode from the remaining dataset for three episodes. For the multi-class object detection task... The MDN algorithm is given an initial labeled pool of 600 samples (20% of budget), and proceeds to collect 800 samples per episode for three episodes. We fix the total budget of 3,000 samples and change the number of samples allocated during each episode.
Hardware Specification	No	The acknowledgements mention "compute resources from NVIDIA Academic Grant Edge AI for Equitable and Safe Intersections in Urban Metropolises", but no specific GPU or CPU models are provided for the experiments.
Software Dependencies	No	The paper mentions optimizers like Adam and SGD, and model architectures like VGG16, SSD300, YOLOv8, and Smol VLM-256M-Base, but does not provide specific version numbers for any software libraries or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup	Yes	We also observe from Figure 1(b) that the Adam optimizer with cosine-annealing learning rate scheduler (LRS) and L2 regularization (Reg) provides the smoothest trajectory, which we adopt for the following experiments. For the optimization step (Line 6 of Algorithm 1), we consider Online Gradient Descent (OGD) (Hazan, 2016). Recall that kt is the data source selected for the current round t... where ηt := 1/(2Lsqrt(t)) is the learning rate. CIFAR10: We execute our algorithms for 20,000 rounds and collect a batch of 32 samples every 60 rounds until reaching the budget. PASCAL VOC2012: We execute our algorithms by pretraining for 10,000 rounds (freezing the backbone), collecting a batch of 8 samples every 50 rounds. Then we finetune for 20,000 rounds, collecting a batch of 8 samples 100 rounds until reaching the budget.