Label Alignment Regularization for Distribution Shift
Authors: Ehsan Imani, Guojun Zhang, Runjia Li, Jun Luo, Pascal Poupart, Philip H.S. Torr, Yangchen Pan
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we first design a synthetic dataset to verify that our regularizer is indeed beneficial in a distribution shift setting by adjusting the classifier and to perform an ablation study on the role of removing implicit regularization. Then, we demonstrate the effectiveness of our method on a well-known benchmark where classic domain-adversarial methods are known to fail (Zhao et al., 2019). Last, we show our algorithm s practical utility in a cross-lingual sentiment classification task. |
| Researcher Affiliation | Collaboration | Ehsan Imani EMAIL University of Alberta, Alberta Machine Intelligence Institute, Guojun Zhang EMAIL Huawei Noah s Ark Lab, Runjia Li EMAIL Department of Engineering Science, University of Oxford, Jun Luo EMAIL Huawei Noah s Ark Lab, Pascal Poupart EMAIL School of Computer Science, University of Waterloo, Philip H.S. Torr EMAIL Yangchen Pan EMAIL Department of Engineering Science, University of Oxford |
| Pseudocode | Yes | Algorithm 1 shows the pseudo-code. Algorithm 1 Label Alignment Regression Get data Φ, y, Φ, and hyperparameters t, α, k, k, λ Compute covariance matrices Φ Φ and Φ Φ Perform eigendecomposition of Φ Φ and Φ Φ to get σk+1:d, σ k+1:d, vk+1:d and v k+1:d Initialize w to zero for t iterations do Perform gradient step with respect to Φw y 2 Pd i=k+1 σ2 i (w V i )2+λ Pd i= k+1 σ2 i (w V i )2 with step-size α and update w end for |
| Open Source Code | Yes | An implementation is available at https://github.com/Ehsan EI/lar/. |
| Open Datasets | Yes | Additionally, we report improved performance over domain adaptation baselines in well-known tasks such as MNIST-USPS domain adaptation and cross-lingual sentiment analysis. ... XED ( Ohman et al., 2020) is a sentence-level sentiment analysis dataset... UCI CT Scan: A random subset of the CT Position dataset on UCI (Graf et al., 2011). Song Year: A random subset of the training portion of the Million Song dataset (Bertin-Mahieux et al., 2011). Bike Sharing: A random subset of the Bike Sharing dataset on UCI (Fanaee-T and Gama, 2014). MNIST: The task is classifying any pair of two digits in MNIST. USPS: The task is classifying any pair of two digits in USPS. CIFAR-10: The task is classifying airplane and automobile in CIFAR-10 dataset... CIFAR-100: The task is classifying beaver and dolphin in CIFAR-100 dataset... STL-10: The task is classifying airplane and bird in STL-10 dataset... AG News: A random subset of the first two classes (World and Sports) in AG News document classification dataset. |
| Dataset Splits | Yes | We use the train split of the dataset for the source domain and the test split of the other dataset for the target domain. A small set of 100 labeled points from the target domain is used for hyperparameter selection as we have not developed a fully unsupervised hyperparameter selection strategy. ... The last three columns are like the second column except that, in binary classification between two digits, only a certain ratio of the lower digit in the source domain, as indicated in the header, is used. This subsampling creates a large degree of imbalance that, as Zhao et al. (2019) observed, poses a challenge to domain-adversarial methods. ... We perform 5 runs and in each one 100 points are randomly sampled from the target domain for validation and the rest are used for evaluation. |
| Hardware Specification | No | The paper mentions 'features from a Res Net-18 pretrained on Image Net' and '768-dimensional sentence embeddings obtained with BERT (Devlin et al., 2019) models pre-trained on the corresponding languages'. However, it does not specify what hardware was used to run their experiments, only what models/features were used or pretrained on. There are no explicit mentions of GPUs, CPUs, memory, or specific cloud instances used for their training or inference. |
| Software Dependencies | No | This is the default numerical rank computation method in the Numpy package. The paper mentions 'Numpy package' for numerical rank computation, but does not provide a specific version number. Other software such as DANN, BERT, and ResNet are mentioned as models or architectures, not as specific software libraries with version numbers used for implementation. |
| Experiment Setup | Yes | DANN uses a one-layer Re LU neural network. This is the Shallow DANN architecture suggested by the original authors (Ganin et al., 2016). We swept over values of 128, 256, 512, and 1024 for the depth of the hidden layer. The neural network is trained for 10 epochs using SGD with batch size 32, learning rate 0.01, and momentum 0.9. ... Candidate hyperparameter values for Label Alignment Regularizer were {1e 1, 1e + 1, 1e + 3} for λ and {8, 16, 32, } up to the rank of Φ or Φ for k and k. ... The linear model is trained using full-batch gradient descent for 5000 epochs with learning rate 1/(2σ1). ... For the domain-adversarial baseline (Conneau et al., 2018) we sweep over values of {1e 3, 1e 2, 1e 1} for β. ... The models used in No Adaptation, Adv Refine, and Label Alignment Regression are linear regression or logistic regression... These models are trained with learning rate 1/(2σ1) (MSE loss) and 1e 2 (CE loss) and momentum 0.9. For CDAN and f-DAL we sweep over regularization coefficients {1e 4, 1e 2, 1} with a one-hidden-layer Re LU network. |