reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Label Alignment Regularization for Distribution Shift

Authors: Ehsan Imani, Guojun Zhang, Runjia Li, Jun Luo, Pascal Poupart, Philip H.S. Torr, Yangchen Pan

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we ﬁrst design a synthetic dataset to verify that our regularizer is indeed beneﬁcial in a distribution shift setting by adjusting the classiﬁer and to perform an ablation study on the role of removing implicit regularization. Then, we demonstrate the eﬀectiveness of our method on a well-known benchmark where classic domain-adversarial methods are known to fail (Zhao et al., 2019). Last, we show our algorithm s practical utility in a cross-lingual sentiment classiﬁcation task.
Researcher Affiliation	Collaboration	Ehsan Imani EMAIL University of Alberta, Alberta Machine Intelligence Institute, Guojun Zhang EMAIL Huawei Noah s Ark Lab, Runjia Li EMAIL Department of Engineering Science, University of Oxford, Jun Luo EMAIL Huawei Noah s Ark Lab, Pascal Poupart EMAIL School of Computer Science, University of Waterloo, Philip H.S. Torr EMAIL Yangchen Pan EMAIL Department of Engineering Science, University of Oxford
Pseudocode	Yes	Algorithm 1 shows the pseudo-code. Algorithm 1 Label Alignment Regression Get data Φ, y, Φ, and hyperparameters t, α, k, k, λ Compute covariance matrices Φ Φ and Φ Φ Perform eigendecomposition of Φ Φ and Φ Φ to get σk+1:d, σ k+1:d, vk+1:d and v k+1:d Initialize w to zero for t iterations do Perform gradient step with respect to Φw y 2 Pd i=k+1 σ2 i (w V i )2+λ Pd i= k+1 σ2 i (w V i )2 with step-size α and update w end for
Open Source Code	Yes	An implementation is available at https://github.com/Ehsan EI/lar/.
Open Datasets	Yes	Additionally, we report improved performance over domain adaptation baselines in well-known tasks such as MNIST-USPS domain adaptation and cross-lingual sentiment analysis. ... XED ( Ohman et al., 2020) is a sentence-level sentiment analysis dataset... UCI CT Scan: A random subset of the CT Position dataset on UCI (Graf et al., 2011). Song Year: A random subset of the training portion of the Million Song dataset (Bertin-Mahieux et al., 2011). Bike Sharing: A random subset of the Bike Sharing dataset on UCI (Fanaee-T and Gama, 2014). MNIST: The task is classifying any pair of two digits in MNIST. USPS: The task is classifying any pair of two digits in USPS. CIFAR-10: The task is classifying airplane and automobile in CIFAR-10 dataset... CIFAR-100: The task is classifying beaver and dolphin in CIFAR-100 dataset... STL-10: The task is classifying airplane and bird in STL-10 dataset... AG News: A random subset of the ﬁrst two classes (World and Sports) in AG News document classiﬁcation dataset.
Dataset Splits	Yes	We use the train split of the dataset for the source domain and the test split of the other dataset for the target domain. A small set of 100 labeled points from the target domain is used for hyperparameter selection as we have not developed a fully unsupervised hyperparameter selection strategy. ... The last three columns are like the second column except that, in binary classiﬁcation between two digits, only a certain ratio of the lower digit in the source domain, as indicated in the header, is used. This subsampling creates a large degree of imbalance that, as Zhao et al. (2019) observed, poses a challenge to domain-adversarial methods. ... We perform 5 runs and in each one 100 points are randomly sampled from the target domain for validation and the rest are used for evaluation.
Hardware Specification	No	The paper mentions 'features from a Res Net-18 pretrained on Image Net' and '768-dimensional sentence embeddings obtained with BERT (Devlin et al., 2019) models pre-trained on the corresponding languages'. However, it does not specify what hardware was used to run their experiments, only what models/features were used or pretrained on. There are no explicit mentions of GPUs, CPUs, memory, or specific cloud instances used for their training or inference.
Software Dependencies	No	This is the default numerical rank computation method in the Numpy package. The paper mentions 'Numpy package' for numerical rank computation, but does not provide a specific version number. Other software such as DANN, BERT, and ResNet are mentioned as models or architectures, not as specific software libraries with version numbers used for implementation.
Experiment Setup	Yes	DANN uses a one-layer Re LU neural network. This is the Shallow DANN architecture suggested by the original authors (Ganin et al., 2016). We swept over values of 128, 256, 512, and 1024 for the depth of the hidden layer. The neural network is trained for 10 epochs using SGD with batch size 32, learning rate 0.01, and momentum 0.9. ... Candidate hyperparameter values for Label Alignment Regularizer were {1e 1, 1e + 1, 1e + 3} for λ and {8, 16, 32, } up to the rank of Φ or Φ for k and k. ... The linear model is trained using full-batch gradient descent for 5000 epochs with learning rate 1/(2σ1). ... For the domain-adversarial baseline (Conneau et al., 2018) we sweep over values of {1e 3, 1e 2, 1e 1} for β. ... The models used in No Adaptation, Adv Reﬁne, and Label Alignment Regression are linear regression or logistic regression... These models are trained with learning rate 1/(2σ1) (MSE loss) and 1e 2 (CE loss) and momentum 0.9. For CDAN and f-DAL we sweep over regularization coeﬃcients {1e 4, 1e 2, 1} with a one-hidden-layer Re LU network.