Learning from Noisy Pairwise Similarity and Unlabeled Data

Authors: Songhua Wu, Tongliang Liu, Bo Han, Jun Yu, Gang Niu, Masashi Sugiyama

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we experimentally investigate the behavior of the proposed method and the baselines for n SU classification on both synthetic and benchmark datasets. All experiments were conducted with 3.10GHz Intel(R) Core(TM) i9-9900 CPU and NVIDIA 2080Ti.
Researcher Affiliation Academia Sydney AI Centre Te University of Sydney Sydney, Australia Tongliang Liu EMAIL Sydney AI Centre Te University of Sydney Sydney, Australia Bo Han EMAIL Department of Computer Science Hong Kong Baptist University Hong Kong, China Jun Yu EMAIL Department of Automation University of Science and Technology of China Hefei, China Gang Niu EMAIL Center for Advanced Intelligence Project RIKEN Tokyo, Japan Masashi Sugiyama EMAIL Center for Advanced Intelligence Project RIKEN Tokyo, Japan Graduate School of Frontier Sciences Te University of Tokyo Chiba, Japan
Pseudocode Yes Algorithm 1 n SU classification. Input: Noisy similar data pairs Ds and unlabeled data Du; Output: Te classifier ˆf; Stage 1. Estimate the similar rate πs and noise rate ρd Intermediate parameters (γ, κ) = MPE( Ds, Du); Compute (πs, ρd, π+, π ) from (γ, κ); Stage 2. Obtain classifier ˆf if Squared loss then Compute the analytical solution ˆ w by Eq. (19); end if if Logistic loss then Approximate the optimal classifier f by the SGD; end if return ˆf;
Open Source Code No The paper does not provide any specific links to code repositories, nor does it explicitly state that the code for the described methodology is open-source or available in supplementary materials.
Open Datasets Yes Here datasets were obtained from the LIBSVM data (Chang and Lin, 2011) and UCI Machine Learning Repository (Dua and Graff, 2017). ... SMS Spam (Almeida et al., 2011) is a public set of short message service (SMS) labeled messages... News20 is a collection of approximately 20,000 newsgroup documents... CIFAR-10 (Krizhevsky et al., 2009) has 32 32 3 color images including 50,000 training images and 10,000 test images of 10 classes.
Dataset Splits Yes To obtain n SU data, first, we collected raw binary classification datasets which consist of positive and negative data, while leaving 10% of the data as test data. ... For all the experiments, the sample size of the noisy similar data pairs was fixed to 4000, while the sample size of the unlabeled data was fixed to 2000.
Hardware Specification Yes All experiments were conducted with 3.10GHz Intel(R) Core(TM) i9-9900 CPU and NVIDIA 2080Ti.
Software Dependencies No For unsupervised learning methods, we directly used the implementations on scikit-learn (Pedregosa et al., 2011). - While scikit-learn is a software, it does not specify a version number. No other specific software versions are mentioned.
Experiment Setup Yes The regularization parameter λ was fixed to 10 4. For the deep model, we employed a 3-layer MLP (multilayer perceptron) with the sofsign active function (Saf(x) = x/(1 + |x|)). We used the stochastic gradient descent (SGD) optimizer with an initial learning rate of 0.002, which decays every 40 epochs by a factor of 0.1 with 200 epochs in total.