Heterogeneous Label Shift: Theory and Algorithm
Authors: Chao Xu, Xijia Tang, Chenping Hou
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on various benchmarks for cross-modal classification validate the effectiveness and practical relevance of the proposed approach. In summary, the contributions of our paper are listed as follows. We introduce and investigate a novel learning problem, namely, Heterogeneous Label Shift (HLS), which is rarely studied and arisen from many real application areas. To our knowledge, this may be the first attempt concerning knowledge transfer in this simultaneous heterogeneous feature spaces and shifted label distributions scenario with a theoretical guarantee. We present a novel error decomposition theorem that directly suggests a bound minimization HLS framework. Motivated by the theoretical analysis, we devise a Heterogeneous Label Shift Adversarial Network (HLSAN) algorithm as an illustration within the framework. Comprehensive experimental studies demonstrate the effectiveness of our proposal on multiple benchmarks with varying degrees of shifts for different types of cross-modal classification tasks. |
| Researcher Affiliation | Academia | 1College of Science, National University of Defense Technology, Changsha, 410073, China. |
| Pseudocode | Yes | The HLSAN algorithm is summarized in Algorithm 1. Algorithm 1 HLSAN Algorithm |
| Open Source Code | No | The paper does not contain any explicit statements about releasing code or links to a code repository. |
| Open Datasets | Yes | We design cross-modal knowledge transfer tasks using two real-world datasets: the Multilingual Reuters Collection (Li et al., 2014) and Wikipedia (Fang et al., 2023). Detailed descriptions of these tasks are provided in Appendix B.4. Wikipedia1 is sourced from Wikipedia feature articles and contains 2,866 image-text pairs across 10 semantic categories... 1http://www.svcl.ucsd.edu/projects/crossmodal/ Multilingual Reuters Collection2 contains over 11,000 news articles spanning six categories across five languages... 2http://multilingreuters.iit.nrc.ca/Reuters Multi Lingual Multi View.htm |
| Dataset Splits | No | To simulate shifted label distributions in the benchmark datasets, we employ the Dirichlet shift approach proposed in (Guo et al., 2020). In this setup, the target dataset is assumed to follow a uniform label distribution, achieved through sampling, while the source domain label distribution is altered. Specifically, the source label distribution is sampled from a Dirichlet distribution parameterized by the concentration factor γ. Experiments are conducted under three label distribution shift settings, γ = 2, γ = 5, and γ = 10. For a fair comparison, we provide only a minimal amount of labeled target domain data (one sample per class) to the Ss HDA methods, closely approximating the unsupervised setting. |
| Hardware Specification | No | No specific hardware details (like GPU or CPU models, or cloud computing instances with specifications) are mentioned in the paper for running the experiments. |
| Software Dependencies | No | The paper mentions that "All network parameters are optimized using SGD with momentum" and "The activation function is RELU," but it does not specify any software libraries (e.g., PyTorch, TensorFlow) or their version numbers, nor the version of the programming language used. |
| Experiment Setup | Yes | Network Architecture. In CMAN, the feature transformation network Ts,θs and Tt,θt are instanced with four-layer fully-connected neural networks. The label classifier hφ and domain discriminator dϕ are instanced with three-layer fully-connected neural networks. All network parameters are optimized using SGD with momentum. And the learning rate is set to 0.02, The activation function is RELU. Parameters Setting. There are two parameters α and β in HLSAN, the value ranges of α and β are set to [0.01, 0.05, 0.1, 0.5, 1, 5]. As for the comparison methods, we utilize the suggested default parameter settings provided by their original authors. In addition, the number of parallel instances is set to 100, i.e., np = 100. The warming-up epochs E1 = 20 and the starting-up epochs E2 = 50. For the Dirichlet shift, we draw Ps(Y ) from a dirichlet distribution with concentration parameter as 10. |