Sparse Autoencoders, Again?

Authors: Yin Lu, Xuening Zhu, Tong He, David Wipf

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we empirically compare VAEase against salient baselines on both synthetic and real-world datasets. Full experimental details, including data generation processes and model settings, are deferred to Appendix D.
Researcher Affiliation Collaboration 1School of Data Science, Fudan University 2Amazon Web Services. Correspondence to: Yin Lu <EMAIL>, Xuening Zhu <EMAIL>, Tong He <EMAIL>, David Wipf <EMAIL>.
Pseudocode No The paper describes the methodology using mathematical formulations and descriptive text, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes A link to the code is here. ... As for data, online code is available for gathering the intermediate activation layers.7 (https://github.com/Hoagy C/sparse_coding).
Open Datasets Yes We apply this approach to MNIST (Deng, 2012) and Fashion MNIST (Xiao et al., 2017) image datasets... Pile-10k dataset (Gao et al., 2020)... The Yelp dataset was obtained from hugging face.8
Dataset Splits No The paper mentions training and testing on datasets like MNIST, Fashion MNIST, Pile-10k, and Yelp, and refers to 'test samples' and 'new test points not seen during training'. However, it does not explicitly provide specific percentages, sample counts for train/validation/test splits, or clear references to predefined benchmark splits for these datasets.
Hardware Specification No The paper describes model architectures, training parameters, and datasets used for experiments, but does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or cloud computing instances) used to run these experiments.
Software Dependencies No The paper mentions using specific optimizers like Adam and refers to code repositories (e.g., PyTorch-GAN), implying the use of certain frameworks. However, it does not provide specific version numbers for any software libraries, frameworks, or programming languages used in the experiments.
Experiment Setup Yes Models were trained for 150 epochs on linear dataset and 310 epochs on the MLP dataset. The batch size was set to 1024. We choose κ = 20 for the linear dataset and κ = 60 for the MLP dataset. The learning rate for VAE models was 0.01 on linear dataset and 0.005 on the MLP dataset, while the learning rates for SAE models are 0.002 on linear dataset and 0.005 on the MLP dataset. The optimizer is Adam and learning rate scheduler is Cosine Annealing Warm Restarts with T0 = 10. The penalty weights for SAE models in (1) are {λ1 = 1e 3, λ2 = 1e 4} on linear dataset. On the MLP dataset, the weights are {λ1 = 5e 4, λ2 = 1e 5} and {λ1 = 5e 6, λ2 = 1e 5} for SAE-ℓ1 and SAE-log models.