PAC Learning with Improvements
Authors: Idan Attias, Avrim Blum, Keziah Naggita, Donya Saless, Dravyansh Sharma, Matthew Walter
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 6, we conduct experiments on three realworld and one fully synthetic binary classification tabular datasets to investigate how the error rate of a model function (h) decreases when test-set agents that it initially classified as negative improve. Our results indicate that while risk-averse models may start with higher error rates, their errors rapidly drop as the negatively classified test agents improve and the improvement budget (r) increases. |
| Researcher Affiliation | Academia | Idan Attias 1 2 Avrim Blum 2 Keziah Naggita 2 Donya Saless 2 Dravyansh Sharma 2 3 Matthew Walter 2 1University of Illinois at Chicago 2Toyota Technological Institute at Chicago 3Northwestern University. Correspondence to: Keziah Naggita <EMAIL>, Donya Saless <EMAIL>. |
| Pseudocode | Yes | Below are the steps of the improvement algorithm we used to compute each agent s improvement features. Initialization: x (0) = xorig Iterative updates: For t = 0, 1, . . . , T 1: 1. Compute the gradient of the loss L with respect to the agent s updates x (t): g(t) = x (t)L h x (t) , h xorig 2. Update the improvement features by taking a step in the direction of the sign of the gradient: ( α sign(g(t)[i]), if i S 0, otherwise , i [d] x (t+1) = x (t) + ρ(t) 3. Project the updated improvement features back onto the r-ball around the original features xorig: x (t+1) = xorig + clip[ r,r](x (t+1) xorig) Improvement vector: After T iterations, the final agent s improvement is given by: |
| Open Source Code | Yes | Our code is publicly available here. |
| Open Datasets | Yes | We use three real-world tabular datasets: the Adult UCI dataset (Becker & Kohavi, 1996), the OULAD and Law School datasets (Le Quy et al., 2022a), and a synthetic 8-dimensional binary classification dataset with class separability 4 and minimal outliers, generated using Scikit-learn s make classification function (Pedregosa et al., 2011). In each case we train a zero-error model f on the entire dataset, which we treat as the true labeling function for our experiments. Let ST = {(x, y) | x Rd, y {0, 1}} represent the dataset (e.g., Adult), where x is the feature vector and y = f (x) is the label. For all experiments, we split ST into training Strain (70%) and testing Stest (30%) subsets. Further dataset details, including improvement features and class distributions, are provided in Appendix E.1. |
| Dataset Splits | Yes | For all experiments, we split ST into training Strain (70%) and testing Stest (30%) subsets. |
| Hardware Specification | Yes | All experiments were conducted on a laptop computer with the following hardware specifications: 2.6-GHz 6-Core Intel Core i7 processor, 16 GB of 2400-MHz DDR4 RAM, and an Intel UHD Graphics 630 graphics card with 1536 MB of memory. |
| Software Dependencies | No | We trained two-layer neural networks, denoted as h functions, using Py Torch with Adam optimizer with a learning rate of 0.001 and a batch size of 64. These h functions generate decisions for the test set agents. In cases where the test agent receives a negative classification, they can, if within budget, improve their feature values to get the desired classification from the h function. |
| Experiment Setup | Yes | We trained two-layer neural networks, denoted as h functions, using Py Torch with Adam optimizer with a learning rate of 0.001 and a batch size of 64. These h functions generate decisions for the test set agents. In cases where the test agent receives a negative classification, they can, if within budget, improve their feature values to get the desired classification from the h function. Table 4 summarizes the performance metrics of the f and h model functions, demonstrating their varied performance across the datasets. Since the empirical setup evaluates the impact of improvement on h s error drop rates, we vary the loss functions we train the model h function with. We use the standard binary cross entropy loss (BCE) and the risk-averse weighted-BCE (w BCE) loss functions defined in Equation 5. |