Loss factorization, weakly supervised learning and label noise robustness

Authors: Giorgio Patrini, Frank Nielsen, Richard Nock, Marcello Carioni

ICML 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The theory is validated by experiments in which we call the adapted SGD as a black box. ... We analyze experimentally the theory so far developed. ... The next results are based on UCI data. We learn with logistic loss, without model s intercept and set λ = 10 6 and T = 4 2m (4 epochs). We measure dclean and RD,01, injecting symmetric label noise p [0, 0.45) and averaging over 25 runs. ... We conclude with a systematic study of hold-out error of µSGD. The same datasets are now split in 1/5 test and 4/5 training sets once at random. In contrast with the previous experimental setting we perform cross-validation of λ 10{ 3,...,+3} on 5-folds in the training set. We compare with vanilla SGD run on corrupted sample S and measure the gain from estimating ˆµ S. ... Table 2 reports test error for SGD and µSGD over 25 trials of artificially corrupted datasets.
Researcher Affiliation Collaboration Giorgio Patrini1,2 EMAIL Frank Nielsen3,4 EMAIL Richard Nock2,1 EMAIL Marcello Carioni5 EMAIL Australian National University1, Data612, Ecole Polytechnique3, Sony Computer Science Laboratories Inc4, Max Planck Institute for Mathematics in the Sciences5
Pseudocode Yes Algorithm 1 µSGD... Algorithm 2 µSGD applied on noisy labels
Open Source Code No No statement or link is provided regarding the availability of open-source code for the described methodology.
Open Datasets No The next results are based on UCI data.
Dataset Splits Yes The same datasets are now split in 1/5 test and 4/5 training sets once at random. In contrast with the previous experimental setting we perform cross-validation of λ 10{ 3,...,+3} on 5-folds in the training set.
Hardware Specification No No specific hardware details (like GPU/CPU models, processor types, or memory amounts) are mentioned for the experimental setup.
Software Dependencies No We learn with logistic loss... The learning rate η is untouched from Shalev-Shwartz et al. (2011)... We learn with λ = 10 6 by standard square loss.
Experiment Setup Yes We learn with logistic loss, without model s intercept and set λ = 10 6 and T = 4 2m (4 epochs). ... we perform cross-validation of λ 10{ 3,...,+3} on 5-folds in the training set.