PAC Learning with Improvements

Authors: Idan Attias, Avrim Blum, Keziah Naggita, Donya Saless, Dravyansh Sharma, Matthew Walter

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 6, we conduct experiments on three realworld and one fully synthetic binary classification tabular datasets to investigate how the error rate of a model function (h) decreases when test-set agents that it initially classified as negative improve. Our results indicate that while risk-averse models may start with higher error rates, their errors rapidly drop as the negatively classified test agents improve and the improvement budget (r) increases.
Researcher Affiliation Academia Idan Attias 1 2 Avrim Blum 2 Keziah Naggita 2 Donya Saless 2 Dravyansh Sharma 2 3 Matthew Walter 2 1University of Illinois at Chicago 2Toyota Technological Institute at Chicago 3Northwestern University. Correspondence to: Keziah Naggita <EMAIL>, Donya Saless <EMAIL>.
Pseudocode Yes Below are the steps of the improvement algorithm we used to compute each agent s improvement features. Initialization: x (0) = xorig Iterative updates: For t = 0, 1, . . . , T 1: 1. Compute the gradient of the loss L with respect to the agent s updates x (t): g(t) = x (t)L h x (t) , h xorig 2. Update the improvement features by taking a step in the direction of the sign of the gradient: ( α sign(g(t)[i]), if i S 0, otherwise , i [d] x (t+1) = x (t) + ρ(t) 3. Project the updated improvement features back onto the r-ball around the original features xorig: x (t+1) = xorig + clip[ r,r](x (t+1) xorig) Improvement vector: After T iterations, the final agent s improvement is given by:
Open Source Code Yes Our code is publicly available here.
Open Datasets Yes We use three real-world tabular datasets: the Adult UCI dataset (Becker & Kohavi, 1996), the OULAD and Law School datasets (Le Quy et al., 2022a), and a synthetic 8-dimensional binary classification dataset with class separability 4 and minimal outliers, generated using Scikit-learn s make classification function (Pedregosa et al., 2011). In each case we train a zero-error model f on the entire dataset, which we treat as the true labeling function for our experiments. Let ST = {(x, y) | x Rd, y {0, 1}} represent the dataset (e.g., Adult), where x is the feature vector and y = f (x) is the label. For all experiments, we split ST into training Strain (70%) and testing Stest (30%) subsets. Further dataset details, including improvement features and class distributions, are provided in Appendix E.1.
Dataset Splits Yes For all experiments, we split ST into training Strain (70%) and testing Stest (30%) subsets.
Hardware Specification Yes All experiments were conducted on a laptop computer with the following hardware specifications: 2.6-GHz 6-Core Intel Core i7 processor, 16 GB of 2400-MHz DDR4 RAM, and an Intel UHD Graphics 630 graphics card with 1536 MB of memory.
Software Dependencies No We trained two-layer neural networks, denoted as h functions, using Py Torch with Adam optimizer with a learning rate of 0.001 and a batch size of 64. These h functions generate decisions for the test set agents. In cases where the test agent receives a negative classification, they can, if within budget, improve their feature values to get the desired classification from the h function.
Experiment Setup Yes We trained two-layer neural networks, denoted as h functions, using Py Torch with Adam optimizer with a learning rate of 0.001 and a batch size of 64. These h functions generate decisions for the test set agents. In cases where the test agent receives a negative classification, they can, if within budget, improve their feature values to get the desired classification from the h function. Table 4 summarizes the performance metrics of the f and h model functions, demonstrating their varied performance across the datasets. Since the empirical setup evaluates the impact of improvement on h s error drop rates, we vary the loss functions we train the model h function with. We use the standard binary cross entropy loss (BCE) and the risk-averse weighted-BCE (w BCE) loss functions defined in Equation 5.