Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation

Authors: Tiansheng Wen, Yifei Wang, Zequn Zeng, Zhong Peng, Yudi Su, Xinyang Liu, Bo Chen, Hongwei Liu, Stefanie Jegelka, Chenyu You

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on image, text, and multimodal benchmarks demonstrate that CSR consistently outperforms MRL in terms of both accuracy and retrieval speed often by large margins while also cutting training time to a fraction of that required by MRL.
Researcher Affiliation Academia 1National Key Laboratory of Radar Signal Processing, Xidian University, Xi an, China 2Stony Brook University, New York, USA 3MIT CSAIL, MA, USA 4TU Munich.
Pseudocode No The paper describes methods through mathematical formulations and descriptive text, but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The abstract states 'Code is available at this https URL.', but the URL itself is not provided in a functional format. Additionally, Section B.3 mentions 'The implementation of CSR is based on the codebase6 provided by Open AI', referencing a third-party repository (https://github.com/openai/sparse_autoencoder) rather than their specific implementation for this paper.
Open Datasets Yes For Image embedding Experiment: Image Net-1K (Deng et al., 2009): Image Net-1K is a large-scale visual database designed to provide researchers with a comprehensive resource for developing and evaluating computer vision models... For Text embedding Experiment: Note that, all datasets mentioned below can be found at MTEB (Muennighoff et al., 2022)... For Multimodal embedding Experiment: MS COCO (Lin et al., 2014): The MS COCO dataset is a large-scale object detection, segmentation, and captioning dataset... Flickr30K (Young et al., 2014): The Flickr30k dataset is a collection of images with corresponding textual descriptions.
Dataset Splits Yes In detail, we use the training set with 1.3M samples as the database and the validation set with 50K samples as the query set. We also report linear probing and few-shot results using Top-1 accuracy. For a holistic evaluation, different methods, Figure 1 (c) presents the average 1-NN performance (active dimensions < 64).
Hardware Specification Yes All experiments are conducted on a server equipped with 4 RTX4090 GPUs.
Software Dependencies No All experiments are conducted in a consistent GPU environment using Py Torch (Paszke et al., 2019). The paper mentions Py Torch but does not provide a specific version number.
Experiment Setup Yes The selection of hyperparameters are: Table 3. Implementation details on Image experiment. Backbone d h lr epoch Batch Size kaux β γ K Optimizer weight decay eps Res Net50 2048 8192 4e-5 10 4096 512 1/32 0.1 8,16,32...2048 Adam 1e-4 6.25 * 1e-10