Multiaccuracy and Multicalibration via Proxy Groups

Authors: Beepul Bharti, Mary Versa Clemens-Sewall, Paul Yi, Jeremias Sulam

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through several experiments on real-world datasets, we illustrate that approximate multiaccuracy and multicalibration can be achieved even when sensitive group data is incomplete or unavailable. Experimental results are detailed in Section 6.
Researcher Affiliation Academia 1Department of Biomedical Engineering, Johns Hopkins University, Baltimore, USA 2Mathematical Institute of Data Science, Johns Hopkins University, Baltimore, USA 3Department of Applied Mathematics & Statistics, Johns Hopkins University, Baltimore, USA 4St. Jude Children s Research Hospital, Arlington, USA 5Department of Computer Science, Johns Hopkins University, Baltimore, USA.
Pseudocode Yes Algorithm 1 Multiaccuracy Regression. Algorithm 2 Multicalibration Boosting.
Open Source Code Yes The code necessary to reproduce these experiments is available at https://github.com/Sulam-Group/proxy_ma-mc.
Open Datasets Yes We illustrate various aspects of our theoretical results on two tabular datasets, ACSIncome and ACSPublic Coverage (Ding et al., 2021), as well as on the Che Xpert medical imaging dataset (Irvin et al., 2019).
Dataset Splits Yes For the ACS datasets, we use a fixed 10% of the samples as the evaluation set. The remaining 90% of the data is split into training and validation sets, with 60% used for training the model f and proxies ˆG and 30% for adjusting f. All reported results are averages over five train/validation splits on the evaluation set. For Che Xpert, we use the splits provided by (Glocker et al., 2023) for training, calibration, and evaluation.
Hardware Specification No The paper describes using a Dense Net-121 model pretrained on Image Net for feature extraction and end-to-end training, but it does not specify any hardware details (e.g., GPU models, CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using logistic regression, decision trees, Random Forests, and Dense Net-121 models. However, it does not provide specific version numbers for any software libraries or frameworks used (e.g., PyTorch 1.9, TensorFlow 2.x, scikit-learn 1.x).
Experiment Setup No The paper describes the types of models used (logistic regression, decision tree, Random Forest, Dense Net-121) and the datasets, but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or other detailed training configurations for these models.