reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Matrix Product Sketching via Coordinated Sampling

Authors: Majid Daliri, Juliana Freire, Danrong Li, Christopher Musco

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate our approach on two applications: 1) distributed linear regression in databases, a problem motivated by tasks like dataset discovery and augmentation, and 2) approximating attention matrices in transformer-based language models. In both cases, our sampling algorithms yield an order of magnitude improvement over linear sketching.
Researcher Affiliation	Academia	Majid Daliri , Juliana Freire , Danrong Li , Christopher Musco New York University Pennsylvania State University
Pseudocode	Yes	Algorithm 1 Priority Sampling Algorithm 2 Approximate Matrix Multiplication Algorithm 3 Sketching for Regression (not optimized)
Open Source Code	No	The paper does not contain an explicit statement about releasing the code for the methodology described, nor does it provide a link to a code repository.
Open Datasets	Yes	The first dataset includes Android application reviews with user ratings, which reflect the quality of the applications (Grano et al., 2017). The second dataset contains IMDB movie reviews, labeled as positive or negative (Maas et al., 2011). Alongside these datasets, we use the (Bai et al., 2023) dataset to produce a long text prompt from its Multi Field QA dataset for the task of KV cache in transformers.
Dataset Splits	No	The paper mentions using "10,000 random reviews" for generating matrix A from the IMDB and Android datasets, but does not specify how these reviews are split into training, validation, or test sets for the regression experiments. It also does not specify splits for the KV cache dataset.
Hardware Specification	No	The paper mentions using the LLaMA 2 model but does not specify the hardware (e.g., GPU models, CPU, memory) used to run its own experiments or conduct its evaluations.
Software Dependencies	No	The paper mentions tools like TF-IDF and SPLADE (Formal et al., 2022) and the LLaMA 2 model, but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) that would be needed to replicate the experiments.
Experiment Setup	No	The paper describes how synthetic data is generated (Gaussian distribution, 10% outliers) and how the A matrix for regression is formed (TF-IDF or SPLADE on reviews), but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs, optimizer settings) or detailed system-level training configurations for any of the models or tasks evaluated.