Learning with Selectively Labeled Data from Multiple Decision-makers

Authors: Jian Chen, Zhehao Li, Xiaojie Mao

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we theoretically and numerically validate the efficacy of our proposed method. We briefly demonstrate the superior performance of our proposed method through numerical experiments in Section 7, with the comprehensive experiments in Appendix E.
Researcher Affiliation Academia 1School of Economics and Management, Tsinghua University, Beijing, China. Correspondence to: Xiaojie Mao <EMAIL>.
Pseudocode Yes Algorithm 1 Unified Cost-sensitive Learning
Open Source Code Yes Refers the code and data to https://github.com/Zhehao97/ Learning-Selective-Labels.git.
Open Datasets Yes This dataset consists of 10459 observations of approved home loan applications. ... based on the home loans dataset from (FICO, 2018). URL https://community.fico.com/s/ explainable-machine-learning-challenge? tabset-158d9=3.
Dataset Splits Yes We randomly split our data into training and testing sets at 7 : 3 ratio.
Hardware Specification No The paper conducts numerical experiments using synthetic and semi-synthetic datasets but does not provide specific details about the hardware used (e.g., GPU/CPU models, memory).
Software Dependencies No The paper mentions implementing parts of the method using Py Torch and utilizing algorithms such as Ada Boost, Gradient Boosting, Logistic Regression, Random Forest, and SVM. However, no specific version numbers for PyTorch or any other software libraries are provided.
Experiment Setup Yes In this experiment, we simulate a full dataset with sample size N = 10000, feature dimensions p = q = 5, the number of instruments level J = 5, and the label classes K = 3. ... The cost-sensitive classification problem is solved by a simple neural network with a softmax output layer. We selected all hyperparameter through 5-fold cross-validation. The missingness decision D {0, 1} is modeled as Bernoulli distributed variables with parameters p D := P(D = 1 | X, U, Z), and the true label Y {1, . . . , K} is modeled as a categorical variable with parameters pk := P(Y = k | X, U, Z) for k [K]. ... The parameter αD (0, 1) controls the impact of unobservable variables U on the labeling process and thus the degree of selection bias, while the parameter αY (0, 1) adjust the magnitude of U affecting the distribution of outcome Y.