Rethinking Knowledge Transfer in Learning Using Privileged Information
Authors: Danil Provodin, Bram van den Akker, Christina Katsimerou, Maurits Clemens Kaptein, Mykola Pechenizkiy
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments for a wide variety of application domains further demonstrate that state-of-the-art LUPI approaches fail to effectively transfer knowledge from PI. Thus, we advocate for practitioners to exercise caution when working with PI to avoid unintended inductive biases. Our contribution Our key contributions can be summarized as follows: We conduct experiments on four real-world datasets from various application domains and find out that no improvement from PI model is observed, which adds evidence to the limited contribution of LUPI in practical applications. |
| Researcher Affiliation | Collaboration | Danil Provodin EMAIL Eindhoven University of Technology Bram van den Akker EMAIL Booking.com Christina Katsimerou EMAIL Booking.com Maurits Kaptein EMAIL Eindhoven University of Technology Mykola Pechenizkiy EMAIL Eindhoven University of Technology |
| Pseudocode | No | The paper describes methods like Generalized distillation and Marginalization with weight sharing using mathematical equations and descriptive text, but it does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code of the experiments can be found at https://github.com/danilprov/rethinking_lupi. |
| Open Datasets | Yes | Repeat Buyers (Alibaba, 2024) ... Heart Disease (BRFSS, 2024) ... NASA-NEO (NASA, 2024) ... Smoker or Drinker (Soo, 2024) ... All datasets are distributed under CC BY-NC 4.0 license. |
| Dataset Splits | Yes | We perform a timestamp-based train test split and use 70% of data for training each model and 30% of data for reporting performance. |
| Hardware Specification | Yes | We distribute all runs across 6 CPU nodes (Intel(R) CPU i7-10750H) and 1 GPU Nvidia Quadro T1000 per run for experiments. |
| Software Dependencies | No | The paper mentions optimizers (rmsprop, Adam) and loss functions (mean squared error, cross-entropy) but does not provide specific version numbers for any programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, scikit-learn). |
| Experiment Setup | Yes | For both Experiment 1 and Experiment 3, as a no-PI, student, and teacher models, we use 1 linear layer of dimension 50, with softmax activation. The networks were trained using an rmsprop optimizer with a mean squared error loss function. The temperature and imitation parameters for Generalized distillation were set to 1. For MNIST and SARCOS experiments, we use two-layer fully connected neural networks of dimension 20, with Re LU hidden activations and softmax output activation for the no-PI, student, and teacher models. The networks were trained using an rmsprop optimizer with a mean squared error loss function. The temperature and imitation parameters for Generalized distillation in the MNIST experiment were set to 10 and 1, respectively, as the best parameter set from the original paper Lopez-Paz et al. (2016). For both of them, as a no-PI model, we use two-layer fully connected neural networks of dimension 64, with tanh hidden activations and linear output activation for regression and sigmoid for classification. TRAM model has an extra hidden layer of size 64 with tanh activation function in the PI head. Both TRAM and no-PI networks are fit using the Adam optimizer Kingma & Ba (2017) with mean squared error loss function. All models are trained for 50 epochs with cross-entropy loss function and Adam optimizer with a base learning rate of 0.001, β1 = 0.9, β1 = 0.95, ϵ = 1e 07. All models are trained with L2 weight regularization with a decay weight of 0.1. |