reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Node Similarities under Random Projections: Limits and Pathological Cases

Authors: Tvrtko Tadić, Cassiano O Becker, Jennifer Neville

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our analysis provides new asymptotic and finite-sample results, identifies pathological cases, and tests them with numerical experiments. ... The computational experiments in the next section will show, using real graphs, that the standard measure for ranking NDCG will depend on node degrees in the case of the dot product for both P = A and P = T, while it will be stable in case of the cosine similarity. ... We illustrate the results developed in the previous section for a ranking application over a real graph. To that end, we consider the Gleich/wikipedia-20060925 dataset from the University of Florida Sparse Matrix Collection (Davis & Hu, 2011).
Researcher Affiliation	Industry	Tvrtko Tadi c Microsoft Search, Assistant and Intelligence EMAIL Cassiano O. Becker Microsoft Search, Assistant and Intelligence EMAIL Jennifer Neville Microsoft Research EMAIL
Pseudocode	No	The paper describes methods and theoretical results but does not include any clearly labeled pseudocode or algorithm blocks. It contains mathematical derivations, theorems, and proofs.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code, nor does it provide links to a code repository or mention code in supplementary materials.
Open Datasets	Yes	We illustrate the results developed in the previous section for a ranking application over a real graph. To that end, we consider the Gleich/wikipedia-20060925 dataset from the University of Florida Sparse Matrix Collection (Davis & Hu, 2011).
Dataset Splits	No	The paper describes a sampling strategy for evaluation: "we split the set of nodes into three segments of size L = 1 106 (low, medium, and high) based on their ordered degrees. Then, we select nodes into three subsets by sampling 300 out of L nodes without replacement from each of these segments, and take S = Slow Smed Shigh as our evaluation sample." However, it does not provide traditional training/test/validation splits for model reproduction.
Hardware Specification	No	The paper mentions "computational experiments" but does not specify any hardware details such as GPU/CPU models, memory, or specific computing environments used for these experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks used for the experiments.
Experiment Setup	Yes	With the previous definitions, we can compute the empirical distributions of ηi over the node degrees di for the three variants of similarity considered, with q = 256 for the random projection dimension.