reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Data Summarization via Bilevel Optimization

Authors: Zalán Borsos, Mojmír Mutný, Marco Tagliasacchi, Andreas Krause

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the advantage of our framework over other data summarization techniques in extensive experimental studies, over a wide range of models and resource-constrained settings, such as continual learning, streaming and batch active learning and dictionary selection for compressed sensing. In this section, we demonstrate the ﬂexibility and eﬀectiveness of our framework for a wide range of models and various settings. We start by evaluating the practical variants of Algorithm 1 proposed Section 3.5, and we compare our method to model-speciﬁc coreset constructions and other data summarization strategies in Section 5.2. We then study our approach in the memory-constrained settings of continual learning and streaming in Sections 5.3, 5.4, of dictionary selection in Section 5.6, and the human-resource constrained setting of batch active learning in Section 5.5.
Researcher Affiliation	Collaboration	Zalán Borsos EMAIL Department of Computer Science ETH Zurich Mojmír Mutný EMAIL Department of Computer Science ETH Zurich Marco Tagliasacchi EMAIL Google Research Andreas Krause EMAIL Department of Computer Science ETH Zurich
Pseudocode	Yes	Algorithm 1 Bilevel Coreset (Bi Co) ... Algorithm 2 Bilevel Coreset via Regularization ... Algorithm 3 Streaming Bi Co with Merge-reduce Buﬀer
Open Source Code	No	The paper does not explicitly state that the authors are releasing their source code, nor does it provide a direct link to a code repository for the methodology described. It mentions using a 'library of Novak et al. (2020)' and a GitHub link for a dataset used, but not for their own implementation.
Open Datasets	Yes	We choose four standard binary classiﬁcation data sets (Dua and Graﬀ, 2017; Uzilov et al., 2006) from the LIBSVM library... For MNIST, we use... For CIFAR-10, we use... For SVHN we only use the train split... The Spoken Digit data set (Jackson, 2016) (2700 utterances, 10 classes) and Speech Commands V2 (Warden, 2018) (85000 utterances, 35 classes) data sets...
Dataset Splits	Yes	We split CIFAR-10 into a train and validation set, where the validation set is a randomly chosen 10% of the original training set... for SVHN we only use the train split, containing approximately 73000 images... For PMNIST, we use a fully connected net... For SMNIST and SCIFAR-10, we use a CNN... We ﬁx the replay memory size m = 100 for tasks derived from MNIST. For SCIFAR-10, we then set the memory size to m = 200... The starting labeled pools are guaranteed to contain at least one sample from each class.
Hardware Specification	Yes	We calculate the corresponding NTKs without batch normalization and pooling with the library of Novak et al. (2020) on a single Ge Force GTX 1080 Ti GPU, whereas the coreset selection is performed on a single CPU.
Software Dependencies	No	The paper mentions using specific optimizers like Adam and SGD, and references the 'library of Novak et al. (2020)', but it does not provide specific version numbers for these software components or for general programming languages/frameworks like Python or PyTorch.
Experiment Setup	Yes	All variants in Section 3.5 use λ = 10 7 regularizer in the inner problem. The inner optimization is performed with Adam using a step size of 0.01 as follows: all variants start with an optimization phase on the initial point set with 5 104 iterations; then, after each step, an additional 104 GD iterations are performed... We use weight decay of 5 10 4 and an initial learning rate of 0.1 cosine-annealed to 0 over 300 n/m epochs, where n is the full data set size and m is the subset size. Additionally, we use dropout with a rate of 0.4 for SVHN. For CIFAR-10, we use the standard data augmentation pipeline of random cropping and horizontal ﬂipping...