Node Similarities under Random Projections: Limits and Pathological Cases

Authors: Tvrtko Tadić, Cassiano O Becker, Jennifer Neville

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our analysis provides new asymptotic and finite-sample results, identifies pathological cases, and tests them with numerical experiments. ... The computational experiments in the next section will show, using real graphs, that the standard measure for ranking NDCG will depend on node degrees in the case of the dot product for both P = A and P = T, while it will be stable in case of the cosine similarity. ... We illustrate the results developed in the previous section for a ranking application over a real graph. To that end, we consider the Gleich/wikipedia-20060925 dataset from the University of Florida Sparse Matrix Collection (Davis & Hu, 2011).
Researcher Affiliation Industry Tvrtko Tadi c Microsoft Search, Assistant and Intelligence EMAIL Cassiano O. Becker Microsoft Search, Assistant and Intelligence EMAIL Jennifer Neville Microsoft Research EMAIL
Pseudocode No The paper describes methods and theoretical results but does not include any clearly labeled pseudocode or algorithm blocks. It contains mathematical derivations, theorems, and proofs.
Open Source Code No The paper does not contain any explicit statements about releasing source code, nor does it provide links to a code repository or mention code in supplementary materials.
Open Datasets Yes We illustrate the results developed in the previous section for a ranking application over a real graph. To that end, we consider the Gleich/wikipedia-20060925 dataset from the University of Florida Sparse Matrix Collection (Davis & Hu, 2011).
Dataset Splits No The paper describes a sampling strategy for evaluation: "we split the set of nodes into three segments of size L = 1 106 (low, medium, and high) based on their ordered degrees. Then, we select nodes into three subsets by sampling 300 out of L nodes without replacement from each of these segments, and take S = Slow Smed Shigh as our evaluation sample." However, it does not provide traditional training/test/validation splits for model reproduction.
Hardware Specification No The paper mentions "computational experiments" but does not specify any hardware details such as GPU/CPU models, memory, or specific computing environments used for these experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks used for the experiments.
Experiment Setup Yes With the previous definitions, we can compute the empirical distributions of ηi over the node degrees di for the three variants of similarity considered, with q = 256 for the random projection dimension.