Predictive Inference with Weak Supervision
Authors: Maxime Cauchois, Suyash Gupta, Alnur Ali, John C. Duchi
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We corroborate the hypothesis that the new coverage definition allows for tighter and more informative (but valid) confidence sets through several experiments. Keywords: Conformal inference, Confidence sets, Coverage validity, Weak supervision, Partial labels 1 Introduction ... To provide some initial insights into the methods and potential applications, we provide experiments on several real-world domains; in the main body (Section 5) we investigate ranking, while the appendices (see Appendix C) provide additional examples with structured prediction, matching for pedestrian tracking in videos, and prediction intervals for county-level voting in the United States. |
| Researcher Affiliation | Academia | Maxime Cauchois EMAIL Department of Statistics Stanford University Stanford, CA 94305-4020, USA Suyash Gupta EMAIL Department of Statistics Stanford University Stanford, CA 94305-4020, USA Alnur Ali EMAIL Department of Statistics Stanford University Stanford, CA 94305-4020, USA John Duchi EMAIL Department of Statistics and electrical Engineering Stanford University Stanford, CA 94305-4020, USA |
| Pseudocode | Yes | Algorithm 1 Partially supervised conformalization Algorithm 2 Greedy weakly supervised scoring mechanism Algorithm 3 Sequential partitioning |
| Open Source Code | No | The paper does not contain an explicit statement about open-sourcing the code for the described methodology, nor does it provide a direct link to a code repository. It mentions the license for the paper itself, but not for software implementation. |
| Open Datasets | Yes | 5.2.2 Ranking experiment with Microsoft LETOR dataset ...Learning to rank with Microsoft LETOR dataset (Qin and Liu, 2013)... C.2 Pedestrian tracking with partial matching information ...Predicting trajectories in the MOT2D15 data set (Leal-Taix e et al., 2015)... C.3 Prediction intervals for weakly supervised regression ...Our data comes from the 2013 2017 American Community Survey 5-Year Estimates... |
| Dataset Splits | Yes | 5.1 A toy classification example ...we simulate n = 104 data points, splitting them into training (30%), calibration (20%) and test (50%) sets. 5.2.1 Ranking simulation study ...we simulate n = 104 i.i.d. different users, using the same (30,20,50) train/validation/test split as in Section 5.1. C.3 Prediction intervals for weakly supervised regression ...we split it into thirds: 33% of the counties (and their associated fractions of Democratic voters) go into the training set, 33% go into the calibration set, and the rest go into the test set; as our splits are random, they are exchangeable. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It describes models and datasets but lacks hardware specifications. |
| Software Dependencies | No | The paper mentions several algorithms and procedures like "List Net procedure (Cao et al., 2007)", "structured S-SVM approach (Tsochantaridis et al., 2004)", and "Hungarian algorithm". However, it does not specify any software libraries, frameworks, or operating systems with their version numbers that would be needed to replicate the experiments. |
| Experiment Setup | Yes | 5 Experiments 5.1 A toy classification example ...We vary the signal-to-noise ratio σ 1 {10 2, . . . , 102}... we simulate n = 104 data points...We draw each θy uniform on Sd 1, {Xi}n i=1 iid N(0, Id), choosing weak threshold T Uni[miny Y{SOracle y }, maxy Y{SOracle y }]. We repeat the entire process Ntrials = 20 times... 5.2.1 Ranking simulation study ...With K = 7 and d = 2... We use the same scoring model for both the fully supervised conformal (FSC) and weakly supervised conformal (WSC) procedures... scoring mechanism (10) with ψ(x, y) := (y x)+... 5.2.2 Ranking experiment with Microsoft LETOR dataset ...For each split (calibration/test), we first sample n = 2000 queries...select K {2, 4, 6, 8, 10, 20} documents...We repeat the entire simulation procedure Ntrials = 20 times...pairwise comparison function ψc(r1, r2) := exp( cr1) (r2 r1)+ with c {0, 2, 5, 8}... C.3 Prediction intervals for weakly supervised regression ...for various values of µ {.01, .05, .1, .15, .2}...We set the miscoverage level α = .05. |