The Complexity of Learning Sparse Superposed Features with Feedback

Authors: Akash Kumar

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our theoretical findings through experiments1 on two distinct applications: feature recovery from Recursive Feature Machines and dictionary extraction from sparse autoencoders trained on Large Language Models.
Researcher Affiliation Academia 1Department of Computer Science & Engineering, University of California, San Diego, USA. Correspondence to: Akash Kumar <EMAIL>.
Pseudocode Yes Algorithm 1 Model of Feature learning with feedback Given: Representation space V Rp, Feature family MF Algorithm 2 Feature learning with sampled representations Given: Representation space V Rp, Distribution over representations DV, Feature family MF. Algorithm 3 Optimization via Gradient Descent
Open Source Code Yes 1(https://github.com/akashkumar-d/ learnsparsefeatureswithfeedback.git)
Open Datasets Yes We validate our theoretical findings through experiments1 on two distinct applications: feature recovery from Recursive Feature Machines and dictionary extraction from sparse autoencoders trained on Large Language Models. ... dictionaries from trained sparse autoencoders on Pythia-70M (Biderman et al., 2023) and Board Game Models (Karvonen et al., 2024)
Dataset Splits No Inputs x R10 are sampled from a Gaussian distribution N(0, 0.5I10). We train an RFM classifier on 5000 training samples to obtain Φ , and the teaching agent has access to this feature matrix for generating feedback. The paper mentions training samples for generating the target feature matrix but does not provide specific train/test/validation splits for the evaluation of the feedback learning process itself.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU models, CPU models, or cloud resources.
Software Dependencies No We utilize the cvxpy package to solve constraints... We use the publicly available repository for dictionary learning via sparse autoencoders on neural network activations (Marks et al., 2024a). The paper mentions 'cvxpy package' and 'Adam optimizer' but does not specify their version numbers. It also refers to a repository but not specific software versions used for the paper's methodology.
Experiment Setup Yes Algorithm 3 Optimization via Gradient Descent ... Lreg(U) = λ U 2 F where B represents the batch of samples, λ = 10 4 is the regularization coefficient, and y = e1 is the fixed unit vector. ...Update U using Adam optimizer with gradient clipping 4. Enforce fixed entries in U after each update (U[0, 0] = 1 is enforced to be 1.)