reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Market for Accuracy: Classification Under Competition

Authors: Ohad Einav, Nir Rosenfeld

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We end with a series of experiments using synthetic and real data that demonstrate the underlying mechanics of accuracy markets and how they operate. Our results demonstrate that learning in such markets can be feasible, that competition converges quickly, and that the market is typically highly efficient and favorable to users. We experiment with three datasets: COMPAS-Arrest, COMPAS-Violence, and Adult, and consider several learning algorithms, including linear SVMs, boosted trees (using XGBoost), and random forests.
Researcher Affiliation	Academia	1Faculty of Computer Science, Technion Israel Institute of Technology. Correspondence to: Nir Rosenfeld <EMAIL>.
Pseudocode	No	The paper describes methods and derivations mathematically and in prose, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is publicly available at https://github.com/BML-Technion/market4acc.
Open Datasets	Yes	We experiment with three datasets: COMPAS-Arrest, COMPAS-Violence, and Adult, and consider several learning algorithms, including linear SVMs, boosted trees (using XGBoost), and random forests. The compas datasets originated from studies of recidivism in the United States (Angwin et al., 2016), and are used to predict if a criminal will be rearrested for general crimes and violent crimes, respectively. The COMPAS-Arrest dataset was preprocessed for analysis by Marx et al. (2020), and a copy of their csv files are included in their code. The csv files can be found at : https://github.com/charliemarx/pmtools/tree/master/data.
Dataset Splits	Yes	For all experiments, the dataset was split into training, validation, and test sets. The test set comprised 20% of the data and was held out for final performance evaluation. The validation set, also comprising 20% of the data, was used for hyperparameter tuning when applicable. In cases where no hyperparameter tuning was performed, the validation set was not utilized, and so only the training and test sets were used. For each split, the data was shuffled and divided into training, validation, and test sets according to the above proportions.
Hardware Specification	Yes	All experiments were run in the Py Charm IDE on a single Macbook Pro laptop, with 16GB of RAM, and M2 processor, and with no GPU support.
Software Dependencies	No	The paper mentions using Python, sklearn, and xgboost packages, but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	Linear SVM: The regularization parameter C = 1.0. Other hyperparameters were left as default. XGBoost: Learning rate: 0.3 Max tree depth: 6 all other hyperparameters remained the default, in particular performing row and column subsampling of 1. the loss metric used for boosting is log-loss. Random Forest: Number of estimators: 10 Max tree depth: the default, meaning all nodes were expanded until all of the leaves are pure or contain a single sample. all other hyperparameters remained the default. the loss metric used for boosting is log-loss