Robust Text Classification under Confounding Shift

Authors: Virgile Landeiro, Aron Culotta

JAIR 2018 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach does not make any causal conclusions but by experimenting on 6 datasets, we show that our approach is able to outperform baselines 1) in controlled cases where confounding shift is manually injected between fitting time and prediction time 2) in natural experiments where confounding shift appears either abruptly or gradually 3) in cases where there is one or multiple confounders.
Researcher Affiliation Academia Virgile Landeiro EMAIL Aron Culotta EMAIL Department of Computer Science Illinois Institute of Technology Chicago, IL 60616
Pseudocode No The paper describes the method using mathematical formulations and prose, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes We also provide the code and datasets required to reproduce our experiments on Git Hub1. 1. https://github.com/tapilab/jair-2018-confound
Open Datasets Yes We also provide the code and datasets required to reproduce our experiments on Git Hub1. 1. https://github.com/tapilab/jair-2018-confound. To build this dataset, we use the data from Maas, Daly, Pham, Huang, Ng, and Potts (2011). It contains 50,000 movie reviews from IMDb labeled with positive or negative sentiment. ... For these experiments, we obtain the data from the 8th round of the Yelp Dataset Challenge.
Dataset Splits Yes For each btrain, btest pair, we sample 5 train/test splits and report the average accuracy. ... To do so, we fix the training data to an initial time period t, then sample testing data from future time periods t + g. The gap size g determines the time between the training and testing set.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models or memory amounts used for running experiments.
Software Dependencies No The paper mentions using "L2-regularized logistic regression" but does not specify any particular software library or its version number.
Experiment Setup Yes In our experiments, we use L2-regularized logistic regression. ... L(D, θ) = Xi D log pθ(yi| xi, zi) λx Pk (θx k)2 λz Pk (θz k)2 (7) where the terms λx and λz control the regularization strength of the term coefficients and confound coefficients, respectively. A default implementation would set λx = λz = 1. ... we only assigned the values 1 or 10 to the tuning parameter v of our approach.