Mentored Learning: Improving Generalization and Convergence of Student Learner
Authors: Xiaofeng Cao, Yaming Guo, Heng Tao Shen, Ivor W. Tsang, James T. Kwok
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 8. Experiments To demonstrate our teaching idea of Section 5, we present the empirical studies for teachingbased hypothesis pruning of Section 5.3, and the self-improvement of teaching of Section 6. With their guarantees, we then present real-world studies for ATML of Section 7.1 and ATML+ of Section 7.2. Dataset We experimented with algorithms on 7 binary classification datasets: skin, shuttle, magic04, covtype, nomao, jm1 and mnist. ... 8.1 Empirical Studies We present the following empirical studies on six UCI binary classification datasets: 1) whether the teaching-based hypothesis pruning of ATML can prune the candidate hypothesis ... 8.2 Real-world Studies We present the performance of ATML and ATML+ in real-world studies. We first report the performance of ATML in the setting of white-box learner, where IWAL(Beygelzimer et al., 2009) and IWAL-D(Cortes et al., 2019b) are used as the baseline. ... Figure 4 presents the error rate of bh T on the test dataset against the number of query labels (on log2 scale). ... Figure 6 presents the relationship of the test accuracy and the number of query labels, where ATML+ wins the traditional active learning baselines. |
| Researcher Affiliation | Academia | Xiaofeng Cao(...) School of Artificial Intelligence Jilin University, Changchun 130012, China and Australian Artifcial Intelligence Institute (AAII) University of Technology Sydney (UTS), NSW 2007, Australia Yaming Guo(...) School of Artificial Intelligence Jilin University, Changchun 130012, China Heng Tao Shen(...) School of Computer Science and Technology Tongji University, Shanghai 201804, China and School of Computer Science and Engineering University of Electronic Science and Technology of China, Chengdu 611731, China Ivor W. Tsang(...) Centre for Frontier AI Research and Institute of High Performance Computing Agency for Science, Technology, and Research(A*STAR), Singapore 138632, Singapore and College of Computing and Data Science Nanyang Technological University, Singapore 639798, Singapore James T. Kwok(...) Department of Computer Science and Engineering The Hong Kong University of Science and Technology, Hong Kong SAR 999077, China |
| Pseudocode | Yes | Algorithm 1 ATML(HT , h T , T, n) 1: Initialize: HT 1 = HT , h T 1 = h T , e H t = 2: for t [T] do ... Algorithm 2 ATML+(h T , T, n) 1: Initialize: Teacher h T 1 = h T , Learner bh0 2: for t [T] do ... |
| Open Source Code | No | The paper does not provide any specific statement about releasing the source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Dataset We experimented with algorithms on 7 binary classification datasets: skin, shuttle, magic04, covtype, nomao, jm1 and mnist. Table 2 shows the summary statistics for all datasets used in our experiment. We denote by N the number of samples, by Dim the number of features, and R is the relative size of the minority class. ... We present the following empirical studies on six UCI binary classification datasets: ... For mnist dataset, we set the digit 3 as the positive class and the digit 5 as the negative class. |
| Dataset Splits | Yes | For each dataset, we randomly select 50% of the data as the training set and approximate the generalization error of the teaching hypothesis by the empirical error on the remaining data. ... For each dataset, we randomly select 70% of the data as the training set and the remaining data as the test set. ... where 70% of dataset is randomly selected as the training set and the remaining data as the test set. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments. It only mentions using a 'CNN network as a classifier' without further hardware specifications. |
| Software Dependencies | No | The paper mentions using a 'CNN network as a classifier' and optimization algorithms like 'stochastic gradient descent (SGD)', but does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the implementation. |
| Experiment Setup | Yes | In all algorithms, we use the CNN network as a classifier following the structure of convolution-relu-convolution-relu-max pooling-dropout-dense-relu-dropout-dense, with the loss function: log(1 + exp( yh(x)) and normalize to [0, 1]. In ATML+, the teaching hypothesis is specified as a pre-trained CNN model. ... For all (x, y) X Y, the loss function is written as ℓ(h(x), y) = log 1 + exp yh(x) , and we use function g (ℓ(h(x), y)) = 2/ 1 + exp ℓ(h(x), y) 1 to normalize the output of ℓ(h(x), y) to [0, 1]. ... In our empirical studies, we randomly generate 10, 000 hyperplanes with bounded norms as the initial hypothesis class HT and set the teaching hypothesis as that hypothesis with the minimum empirical error from HT . |