Learning with Selectively Labeled Data from Multiple Decision-makers
Authors: Jian Chen, Zhehao Li, Xiaojie Mao
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we theoretically and numerically validate the efficacy of our proposed method. We briefly demonstrate the superior performance of our proposed method through numerical experiments in Section 7, with the comprehensive experiments in Appendix E. |
| Researcher Affiliation | Academia | 1School of Economics and Management, Tsinghua University, Beijing, China. Correspondence to: Xiaojie Mao <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Unified Cost-sensitive Learning |
| Open Source Code | Yes | Refers the code and data to https://github.com/Zhehao97/ Learning-Selective-Labels.git. |
| Open Datasets | Yes | This dataset consists of 10459 observations of approved home loan applications. ... based on the home loans dataset from (FICO, 2018). URL https://community.fico.com/s/ explainable-machine-learning-challenge? tabset-158d9=3. |
| Dataset Splits | Yes | We randomly split our data into training and testing sets at 7 : 3 ratio. |
| Hardware Specification | No | The paper conducts numerical experiments using synthetic and semi-synthetic datasets but does not provide specific details about the hardware used (e.g., GPU/CPU models, memory). |
| Software Dependencies | No | The paper mentions implementing parts of the method using Py Torch and utilizing algorithms such as Ada Boost, Gradient Boosting, Logistic Regression, Random Forest, and SVM. However, no specific version numbers for PyTorch or any other software libraries are provided. |
| Experiment Setup | Yes | In this experiment, we simulate a full dataset with sample size N = 10000, feature dimensions p = q = 5, the number of instruments level J = 5, and the label classes K = 3. ... The cost-sensitive classification problem is solved by a simple neural network with a softmax output layer. We selected all hyperparameter through 5-fold cross-validation. The missingness decision D {0, 1} is modeled as Bernoulli distributed variables with parameters p D := P(D = 1 | X, U, Z), and the true label Y {1, . . . , K} is modeled as a categorical variable with parameters pk := P(Y = k | X, U, Z) for k [K]. ... The parameter αD (0, 1) controls the impact of unobservable variables U on the labeling process and thus the degree of selection bias, while the parameter αY (0, 1) adjust the magnitude of U affecting the distribution of outcome Y. |