reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Sparse Matrix-Based Random Projection: A Study of Binary and Ternary Quantization

Authors: Weizhi Lu, Zhongzheng Li, Mingrui Chen, Weiyu Li

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This is validated through classification and clustering experiments, where extremely sparse binary matrices, with only one nonzero entry per column, achieve superior or comparable performance to other denser binary matrices and Gaussian matrices.
Researcher Affiliation	Academia	Weizhi Lu EMAIL School of Control Science and Engineering, Shandong University Key Laboratory of Machine Intelligence and System Control, Ministry of Education Zhongzheng Li EMAIL School of Control Science and Engineering, Shandong University Mingrui Chen EMAIL School of Control Science and Engineering, Shandong University Weiyu Li EMAIL Zhongtai Securities Institute for Financial Studies, Shandong University National Center for Applied Mathematics in Shandong
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks. It provides mathematical formulations and derivations, but no structured algorithm steps.
Open Source Code	No	The paper does not contain any explicit statements about providing source code, nor does it provide any links to a code repository.
Open Datasets	Yes	The sparse data intended for projection are generated from the datasets Yale B (Georghiades et al., 2001; Lee et al., 2005), CIFAR10 (Krizhevsky & Hinton, 2009) and Mini-Image Net (Vinyals et al., 2016), respectively via the feature transforms DWT (Mallat, 2009), Alex Net Conv5 (Krizhevsky et al., 2012) and VGG16 Conv5_3 (Simonyan & Zisserman, 2014).
Dataset Splits	Yes	From the dataset, we randomly select 9/10 samples for training and the rest for testing. CIFAR10 consists of 10 classes of color images, with 6000 samples per class. Mini-Image Net is a subset of Image Net (Deng et al., 2009), which consists of 100 classes of color images, each class having 600 samples. For the latter two datasets, we use their default training and testing samples, with the ratio of 5/1.
Hardware Specification	No	The paper does not specify any particular hardware used for running the experiments (e.g., GPU/CPU models, memory details).
Software Dependencies	No	The paper mentions using KNN, SVM with a linear kernel, and k-means algorithms but does not provide specific version numbers for any software libraries or frameworks used for their implementation.
Experiment Setup	No	The paper specifies feature sparsity ratios (k/n = 1%, 5%, 10%, 20%) and projection ratios (m/n = 10%, 50%), and notes using KNN and SVM for classification and k-means for clustering based on cosine distance. However, it lacks crucial hyperparameters for these algorithms, such as the 'K' value for KNN, the 'C' parameter for SVM, or the number of clusters ('k') for k-means, which are essential for reproducing the experimental setup.