reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Sparse Autoencoders, Again?

Authors: Yin Lu, Xuening Zhu, Tong He, David Wipf

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we empirically compare VAEase against salient baselines on both synthetic and real-world datasets. Full experimental details, including data generation processes and model settings, are deferred to Appendix D.
Researcher Affiliation	Collaboration	1School of Data Science, Fudan University 2Amazon Web Services. Correspondence to: Yin Lu <EMAIL>, Xuening Zhu <EMAIL>, Tong He <EMAIL>, David Wipf <EMAIL>.
Pseudocode	No	The paper describes the methodology using mathematical formulations and descriptive text, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	A link to the code is here. ... As for data, online code is available for gathering the intermediate activation layers.7 (https://github.com/Hoagy C/sparse_coding).
Open Datasets	Yes	We apply this approach to MNIST (Deng, 2012) and Fashion MNIST (Xiao et al., 2017) image datasets... Pile-10k dataset (Gao et al., 2020)... The Yelp dataset was obtained from hugging face.8
Dataset Splits	No	The paper mentions training and testing on datasets like MNIST, Fashion MNIST, Pile-10k, and Yelp, and refers to 'test samples' and 'new test points not seen during training'. However, it does not explicitly provide specific percentages, sample counts for train/validation/test splits, or clear references to predefined benchmark splits for these datasets.
Hardware Specification	No	The paper describes model architectures, training parameters, and datasets used for experiments, but does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or cloud computing instances) used to run these experiments.
Software Dependencies	No	The paper mentions using specific optimizers like Adam and refers to code repositories (e.g., PyTorch-GAN), implying the use of certain frameworks. However, it does not provide specific version numbers for any software libraries, frameworks, or programming languages used in the experiments.
Experiment Setup	Yes	Models were trained for 150 epochs on linear dataset and 310 epochs on the MLP dataset. The batch size was set to 1024. We choose κ = 20 for the linear dataset and κ = 60 for the MLP dataset. The learning rate for VAE models was 0.01 on linear dataset and 0.005 on the MLP dataset, while the learning rates for SAE models are 0.002 on linear dataset and 0.005 on the MLP dataset. The optimizer is Adam and learning rate scheduler is Cosine Annealing Warm Restarts with T0 = 10. The penalty weights for SAE models in (1) are {λ1 = 1e 3, λ2 = 1e 4} on linear dataset. On the MLP dataset, the weights are {λ1 = 5e 4, λ2 = 1e 5} and {λ1 = 5e 6, λ2 = 1e 5} for SAE-ℓ1 and SAE-log models.