Convex Deep Learning via Normalized Kernels
Authors: Özlem Aslan, Xinhua Zhang, Dale Schuurmans
NeurIPS 2014 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To investigate the potential of deep versus shallow convex training methods, and global versus local training methods, we implemented the approach outlined above for a three-layer model along with comparison methods. |
| Researcher Affiliation | Academia | Ozlem Aslan Dept of Computing Science University of Alberta, Canada EMAIL Xinhua Zhang Machine Learning Group NICTA and ANU EMAIL Dale Schuurmans Dept of Computing Science University of Alberta, Canada EMAIL |
| Pseudocode | Yes | Algorithm 1: Conditional gradient algorithm to optimize f(M1, M2) for M1, M2 M. |
| Open Source Code | No | The paper does not contain any explicit statement about making the source code for the described methodology publicly available, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Here we tried to replicate the results of [25] on similar data sets, USPS and COIL from [41], Letter from [42], MNIST, and CIFAR-100 from [43]. |
| Dataset Splits | Yes | a given set of data (X, Y) is divided into separate training and test sets, (XL, YL) and XU, where labels are only included for the training set. |
| Hardware Specification | No | No specific hardware details (such as GPU/CPU models, memory, or cloud instance types) used for running experiments are mentioned in the paper. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the implementation. |
| Experiment Setup | Yes | This loss can be naturally interpreted using the remark following Postulate 1. It encourages that the propensity of example j with respect to itself, Sjj, should be higher than its propensity with respect to other examples, Sij, by a margin that is defined through the normalized kernel M. However note this loss does not correspond to a linear transfer between layers, even in terms of the propensity matrix S or normalized output kernel M. As in all large margin methods, the initial loss (12) is a convex upper bound for an underlying discrete loss defined with respect to a step transfer. |