A Market for Accuracy: Classification Under Competition
Authors: Ohad Einav, Nir Rosenfeld
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We end with a series of experiments using synthetic and real data that demonstrate the underlying mechanics of accuracy markets and how they operate. Our results demonstrate that learning in such markets can be feasible, that competition converges quickly, and that the market is typically highly efficient and favorable to users. We experiment with three datasets: COMPAS-Arrest, COMPAS-Violence, and Adult, and consider several learning algorithms, including linear SVMs, boosted trees (using XGBoost), and random forests. |
| Researcher Affiliation | Academia | 1Faculty of Computer Science, Technion Israel Institute of Technology. Correspondence to: Nir Rosenfeld <EMAIL>. |
| Pseudocode | No | The paper describes methods and derivations mathematically and in prose, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is publicly available at https://github.com/BML-Technion/market4acc. |
| Open Datasets | Yes | We experiment with three datasets: COMPAS-Arrest, COMPAS-Violence, and Adult, and consider several learning algorithms, including linear SVMs, boosted trees (using XGBoost), and random forests. The compas datasets originated from studies of recidivism in the United States (Angwin et al., 2016), and are used to predict if a criminal will be rearrested for general crimes and violent crimes, respectively. The COMPAS-Arrest dataset was preprocessed for analysis by Marx et al. (2020), and a copy of their csv files are included in their code. The csv files can be found at : https://github.com/charliemarx/pmtools/tree/master/data. |
| Dataset Splits | Yes | For all experiments, the dataset was split into training, validation, and test sets. The test set comprised 20% of the data and was held out for final performance evaluation. The validation set, also comprising 20% of the data, was used for hyperparameter tuning when applicable. In cases where no hyperparameter tuning was performed, the validation set was not utilized, and so only the training and test sets were used. For each split, the data was shuffled and divided into training, validation, and test sets according to the above proportions. |
| Hardware Specification | Yes | All experiments were run in the Py Charm IDE on a single Macbook Pro laptop, with 16GB of RAM, and M2 processor, and with no GPU support. |
| Software Dependencies | No | The paper mentions using Python, sklearn, and xgboost packages, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | Linear SVM: The regularization parameter C = 1.0. Other hyperparameters were left as default. XGBoost: Learning rate: 0.3 Max tree depth: 6 all other hyperparameters remained the default, in particular performing row and column subsampling of 1. the loss metric used for boosting is log-loss. Random Forest: Number of estimators: 10 Max tree depth: the default, meaning all nodes were expanded until all of the leaves are pure or contain a single sample. all other hyperparameters remained the default. the loss metric used for boosting is log-loss |