Multiaccuracy and Multicalibration via Proxy Groups
Authors: Beepul Bharti, Mary Versa Clemens-Sewall, Paul Yi, Jeremias Sulam
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through several experiments on real-world datasets, we illustrate that approximate multiaccuracy and multicalibration can be achieved even when sensitive group data is incomplete or unavailable. Experimental results are detailed in Section 6. |
| Researcher Affiliation | Academia | 1Department of Biomedical Engineering, Johns Hopkins University, Baltimore, USA 2Mathematical Institute of Data Science, Johns Hopkins University, Baltimore, USA 3Department of Applied Mathematics & Statistics, Johns Hopkins University, Baltimore, USA 4St. Jude Children s Research Hospital, Arlington, USA 5Department of Computer Science, Johns Hopkins University, Baltimore, USA. |
| Pseudocode | Yes | Algorithm 1 Multiaccuracy Regression. Algorithm 2 Multicalibration Boosting. |
| Open Source Code | Yes | The code necessary to reproduce these experiments is available at https://github.com/Sulam-Group/proxy_ma-mc. |
| Open Datasets | Yes | We illustrate various aspects of our theoretical results on two tabular datasets, ACSIncome and ACSPublic Coverage (Ding et al., 2021), as well as on the Che Xpert medical imaging dataset (Irvin et al., 2019). |
| Dataset Splits | Yes | For the ACS datasets, we use a fixed 10% of the samples as the evaluation set. The remaining 90% of the data is split into training and validation sets, with 60% used for training the model f and proxies ˆG and 30% for adjusting f. All reported results are averages over five train/validation splits on the evaluation set. For Che Xpert, we use the splits provided by (Glocker et al., 2023) for training, calibration, and evaluation. |
| Hardware Specification | No | The paper describes using a Dense Net-121 model pretrained on Image Net for feature extraction and end-to-end training, but it does not specify any hardware details (e.g., GPU models, CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using logistic regression, decision trees, Random Forests, and Dense Net-121 models. However, it does not provide specific version numbers for any software libraries or frameworks used (e.g., PyTorch 1.9, TensorFlow 2.x, scikit-learn 1.x). |
| Experiment Setup | No | The paper describes the types of models used (logistic regression, decision tree, Random Forest, Dense Net-121) and the datasets, but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or other detailed training configurations for these models. |