reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Unified View of Double-Weighting for Marginal Distribution Shift

Authors: José I. Segovia-Martín, Santiago Mazuelas, Anqi Liu

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, the proposed methods achieve enhanced classiﬁcation performance in both synthetic and empirical experiments. (...) 6 Experiments This section shows experimental results for the proposed approaches in comparison with the state-of-the-art methods. (...) 6.1 Experiments for single-source label shift adaptation (...) 6.2 Experiments for multi-source covariate shift adaptation (...) 6.3 Experiments for multi-source label shift adaptation
Researcher Affiliation	Academia	José I. Segovia-Martín EMAIL Basque Center for Applied Mathematics (BCAM) Bilbao, Spain Santiago Mazuelas EMAIL Basque Center for Applied Mathematics (BCAM) IKERBASQUE-Basque Foundation for Science Bilbao, Spain Anqi Liu EMAIL CS department, Whiting School of Engineering Johns Hopkins University Baltimore, Maryland, USA
Pseudocode	Yes	5 Practical Algorithm and Implementation This section describes the implementation of the proposed techniques for double-weighting label shift (DW-LS), double-weighting multi-source (MS) covariate shift (DW-MSCS), and double-weighting MS label shift (DW-MSLS) detailed in Algorithm 1, Algorithm 2 and Algorithm 3, respectively. Algorithm 1 The proposed algorithm for label shift adaptation: DW-LS Algorithm 2 The proposed algorithm for multi-source covariate shift adaptation: DW-MSCS Algorithm 3 The proposed algorithm for multi-source label shift adaptation: DW-MSCS
Open Source Code	Yes	The source code for the methods and the experimental setup presented are publicly available in https://github.com/Machine Learning BCAM/Unified-Double-Weighting-TMLR-2025.
Open Datasets	Yes	In the second set of experiments, we assess the performance of the proposed methods in comparison with existing techniques using real datasets publicly available in the UCI repository Dua & Graﬀ(2017). (...) We consider Spam Detection, 20 Newsgroups, and Sentiment classiﬁcation datasets. (...) 20 Newsgroups , available at http://qwone.com/ ~jason/20Newsgroups/, Sentiment Analysis , available at https://www.cs.jhu.edu/~mdredze/datasets/sentiment/, and Spam detection , available at http:// www.ecmlpkdd2006.org/challenge.html. (...) Domain Net , available at https://ai.bu.edu/M3SDA/, (Peng et al., 2019), and Oﬃce-31 , available at https://github.com/jindongwang/transferlearning/blob/master/data/dataset.md.
Dataset Splits	Yes	In addition, for each type of label shift (value of δ), we carried out 200 random repetitions with 100 training samples and 100 testing samples. (...) In the tweak-one shift, the training distribution is uniform over the set of possible labels, ptr(y) = 1/\|Y\|, while in the testing distribution, we assign probability pte(y) = δ to half of the classes (rounded up). We set δ = 0.05 for 10 repetitions and δ = 0.10 for another 10 repetitions. In the knock-out shift, the testing distribution is uniform over the set of possible labels, while in the training distribution, we remove a proportion δ of the samples from the selected classes. We set δ = 0.9 and select half of the classes (rounded up) for all 20 repetitions. (...) We carried out 100 random repetitions with 200 samples from each source and 200 testing samples and considered linear feature mapping. (...) For the experiments using Sentiment dataset, (...) We randomly sample 1,000 training samples from each source and 150 testing samples in each repetition. For the experiments using Domain Net dataset, (...) randomly sample 100 training samples from each source and 200 testing samples in each repetition. For the experiments using Oﬃce-31 dataset, (...) randomly sample half of the samples from each domain as training samples, and half of the samples from each domain, ensuring that the same number of samples from each domain, as the testing set in each repetition.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processors, or memory amounts used for running the experiments. It mentions using 'pretrained Res Nets to map images into feature vectors' but does not specify the hardware these models ran on.
Software Dependencies	No	The paper mentions several methods and tools like 'KMM', 'BBSE', 'RLLS', 'MLLS', '2SW-MDA', 'MS-DRL', 'CW KMM', and 'Res Net' models. However, it does not specify the version numbers for these software components or any programming language versions used to implement them, which is necessary for reproducibility.
Experiment Setup	No	The paper mentions how hyperparameters D and λ are determined ('We select the value of D to achieve the lowest minimax risk R(U)', 'hyperparameters {λs}S s=1 are determined solving min p,λs 1Tλs') and that 'For the existing methods, we consider the default hyperparameter values provided by the authors.' However, it does not provide concrete values for crucial training hyperparameters such as learning rate, batch size, number of epochs, or the specific optimizer settings used for the models (e.g., logistic regression, ResNets) in its experiments.