Matrix Product Sketching via Coordinated Sampling
Authors: Majid Daliri, Juliana Freire, Danrong Li, Christopher Musco
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate our approach on two applications: 1) distributed linear regression in databases, a problem motivated by tasks like dataset discovery and augmentation, and 2) approximating attention matrices in transformer-based language models. In both cases, our sampling algorithms yield an order of magnitude improvement over linear sketching. |
| Researcher Affiliation | Academia | Majid Daliri , Juliana Freire , Danrong Li , Christopher Musco New York University Pennsylvania State University |
| Pseudocode | Yes | Algorithm 1 Priority Sampling Algorithm 2 Approximate Matrix Multiplication Algorithm 3 Sketching for Regression (not optimized) |
| Open Source Code | No | The paper does not contain an explicit statement about releasing the code for the methodology described, nor does it provide a link to a code repository. |
| Open Datasets | Yes | The first dataset includes Android application reviews with user ratings, which reflect the quality of the applications (Grano et al., 2017). The second dataset contains IMDB movie reviews, labeled as positive or negative (Maas et al., 2011). Alongside these datasets, we use the (Bai et al., 2023) dataset to produce a long text prompt from its Multi Field QA dataset for the task of KV cache in transformers. |
| Dataset Splits | No | The paper mentions using "10,000 random reviews" for generating matrix A from the IMDB and Android datasets, but does not specify how these reviews are split into training, validation, or test sets for the regression experiments. It also does not specify splits for the KV cache dataset. |
| Hardware Specification | No | The paper mentions using the LLaMA 2 model but does not specify the hardware (e.g., GPU models, CPU, memory) used to run its own experiments or conduct its evaluations. |
| Software Dependencies | No | The paper mentions tools like TF-IDF and SPLADE (Formal et al., 2022) and the LLaMA 2 model, but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) that would be needed to replicate the experiments. |
| Experiment Setup | No | The paper describes how synthetic data is generated (Gaussian distribution, 10% outliers) and how the A matrix for regression is formed (TF-IDF or SPLADE on reviews), but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs, optimizer settings) or detailed system-level training configurations for any of the models or tasks evaluated. |