Nearly Optimal Sample Complexity for Learning with Label Proportions

Authors: Robert Istvan Busa-Fekete, Travis Dick, Claudio Gentile, Haim Kaplan, Tomer Koren, Uri Stemmer

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we compare our proposed LLP loss to baseline losses on several datasets and models following closely the setup of Busa-Fekete et al. (2023). ... Figure 1 shows the final test accuracy of each aggregate loss at various bag sizes and number of training epochs (some of the curves are not visible since they are overlapping to one another).
Researcher Affiliation Collaboration 1Google Research, New York, USA 2Tel Aviv University, Israel. Correspondence to: Claudio Gentile <EMAIL>, Travis Dick <EMAIL>.
Pseudocode Yes Algorithm 1 Algorithm for the two function realizable case. ... Algorithm 2 SGD-based algorithm
Open Source Code No The paper does not provide an explicit statement about releasing its source code, nor does it include a link to a code repository. Mentions of other companies' platforms (Apple SKAN, Google Chrome) are not related to the authors' own code release for the methodology described.
Open Datasets Yes We conduct our experiments on the following datasets and models. We use versions of MNIST (Le Cun et al., 2010) and CIFAR-10 (Krizhevsky, 2009) with binary labels, together with the Higgs (Baldi et al., 2014) and UCI Adult (Kohavi & Becker, 1996) datasets, which are already binary tasks.
Dataset Splits Yes We prepare each training dataset by shuffling the data, partitioning it into consecutive bags of size k, and replacing the labels within each bag by their average. ... We use the first 10,000 examples as test data, and the remaining examples as training data. ... The data contains 32,561 training examples and 16,281 test examples.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory, or other computer specifications used for running the experiments.
Software Dependencies No The paper mentions using Adam (Kingma & Ba, 2015) for optimization, but does not specify versions for any programming languages, libraries, or frameworks used in the implementation.
Experiment Setup Yes On all datasets we choose a batch size of N = 1024 and use bag sizes k = 2i for i = 0, . . . , 9. For each batch, we compute the gradient of the aggregate loss function and use Adam (Kingma & Ba, 2015) to update the model parameters. We repeat the above training procedure 10 times for each loss function, dataset, bag size, and Adam learning rate in the set {10 6, 5 10 6, 10 5, 10 4, 5 10 4, 10 3, 5 10 3, 10 2}.