Learning With Multi-Group Guarantees For Clusterable Subpopulations

Authors: Jessica Dai, Nika Haghtalab, Eric Zhao

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our multi-objective approach achieves O(T 1/2) online error without requiring separability in the underlying clusters. This is in contrast to the cluster-then-predict approach, for which we demonstrate O(T 2/3) error rates even under separability assumptions. Along the way, we prove that providing per-subgroup calibration guarantees for underlying clusters can be easier than learning the clusters: separation between median subgroup features is required for the latter but not the former.
Researcher Affiliation Academia 1Department of Electrial Engineering and Computer Science, U.C. Berkeley, Berkeley, CA, USA. Correspondence to: Jessica Dai <EMAIL>.
Pseudocode Yes Algorithm 1: Cluster-Then-Predict Algorithm for Minimizing DCE. Algorithm 2: Online Multicalibration Algorithm for Coverable Distinguishers. Algorithm 3 Algorithm for computing a cover. Algorithm 4 Online multicalibration algorithm.
Open Source Code No The paper does not contain any explicit statements or links indicating the availability of open-source code for the described methodology.
Open Datasets No The paper discusses a generative model where instances are generated from a mixture of k distributions, and uses theoretical constructs like Gaussian mixture models. It does not refer to any specific publicly available datasets used for empirical evaluation.
Dataset Splits No The paper primarily presents theoretical work on algorithms and guarantees for generative models. It does not describe experiments that would involve splitting datasets into training, testing, or validation sets.
Hardware Specification No The paper is theoretical in nature, focusing on algorithmic approaches and error rates. It does not describe any empirical experiments or mention specific hardware used for computations.
Software Dependencies No The paper describes theoretical algorithms and proves bounds. It does not mention any specific software dependencies with version numbers that would be required to reproduce experimental results.
Experiment Setup No This paper is theoretical and does not describe empirical experiments. Therefore, there are no details provided regarding experimental setup, hyperparameters, or training configurations.