Sparse Coding with Multi-layer Decoders using Variance Regularization
Authors: Katrina Evtimova, Yann LeCun
TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments with MNIST and natural image patches, we show that decoders learned with our approach have interpretable features both in the linear and multi-layer case. Moreover, we show that sparse autoencoders with multi-layer decoders trained using our variance regularization method produce higher quality reconstructions with sparser representations when compared to autoencoders with linear dictionaries. Additionally, sparse representations obtained with our variance regularization approach are useful in the downstream tasks of denoising and classification in the low-data regime. |
| Researcher Affiliation | Collaboration | Katrina Evtimova EMAIL Center for Data Science New York University Yann Le Cun EMAIL Courant Institute and Center for Data Science New York University Meta AI FAIR |
| Pseudocode | Yes | Algorithm 1 LISTA encoder E: forward pass Input: Image y Rd, number of iterations L Parameters: U Rd l, S Rl l, b Rl Output: sparse code z E Rl u = Uy + b z0 = Re LU(u) for i = 1 to L do zi = Re LU(u + Szi 1) end for z E = z L |
| Open Source Code | Yes | Our Py Torch implementation is available on Git Hub at https://github.com/kevtimova/deep-sparse. |
| Open Datasets | Yes | MNIST In the first set of our experiments, we use the MNIST dataset (Le Cun & Cortes, 2010) consisting of 28 28 hand-written digits and do standard pre-processing by subtracting the global mean and dividing by the global standard deviation. Natural Image Patches For experiments with natural images, we use patches from Image Net ILSVRC2012 (Deng et al., 2009). |
| Dataset Splits | Yes | We split the training data randomly into 55000 training samples and 5000 validation samples. The test set consists of 10000 images. We use 200000 randomly selected patches of size 28 28 for training, 20000 patches for validation, and 20000 patches for testing. |
| Hardware Specification | Yes | Our Py Torch implementation is available on Git Hub at https://github.com/kevtimova/deep-sparse. We train our models on one NVIDIA RTX 8000 GPU card and all our experiments take less than 24 hours to run. |
| Software Dependencies | No | The paper mentions "Py Torch implementation" but does not provide specific version numbers for PyTorch or any other software dependencies. The reference to Adam (Kingma & Ba, 2014) is for an optimizer algorithm, not a software version. |
| Experiment Setup | Yes | We determine all hyperparameter values through grid search. Full training details can be found in Appendix B.1. The dimension of the latent codes in our MNIST experiments is l = 128 and in experiments with Image Net patches it is l = 256. The batch size is set to 250 which we find sufficiently large for the regularization term on the variance of each latent component in (8) for VDL and VDL-NL models. We set the maximum number of FISTA iterations K to 200 which we find sufficient for good reconstructions. Table 3 contains the hyperparameter values we use in all our experiments except for the ones with WDL, WDL-NL, DO and DO-NL models. |