reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Revisiting Random Walks for Learning on Graphs

Authors: Jinwoo Kim, Olga Zaghen, Ayhan Suleymanzade, Youngmin Ryou, Seunghoon Hong

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate RWNNs on a range of problems, verifying our theoretical analysis and demonstrating the use of language models for separating strongly regular graphs where 3-WL test fails, and transductive classification on ar Xiv citation network1.
Researcher Affiliation	Academia	Jinwoo Kim1 Olga Zaghen2 Ayhan Suleymanzade1 Youngmin Ryou1 Seunghoon Hong1 1KAIST 2University of Amsterdam
Pseudocode	Yes	We leave the pseudocode in Appendix A.2 and notations and proofs in Appendix A.9.
Open Source Code	Yes	1Code is available at https://github.com/jw9730/random-walk.
Open Datasets	Yes	The Circular Skip Links (CSL) graphs dataset (Murphy et al., 2019) contains 10 non-isomorphic regular graphs with 41 vertices of degree 4. The 4 4 rook s and Shrikhande graphs (Alvarez-Gonzalez et al., 2024), which we call SR16, are a pair of strongly regular graphs with 16 vertices of degree 6. The SR25 dataset (Balcilar et al., 2021) contains 15 strongly regular graphs with 25 vertices of degree 12. We use ogbn-arxiv (Hu et al., 2020), a citation network of 169,343 ar Xiv papers with title and abstract. We use a synthetic dataset of 5,000 random regular graphs from Chen et al. (2020) and consider the task of counting substructures. We provide additional experiments on realworld transductive classification datasets Cora, Citeseer, and Amazon Ratings. We conduct a preliminary demonstration of RWNN on real-world graph classification, using the Peptides-func dataset (Dwivedi et al., 2022).
Dataset Splits	Yes	We choose restart rate α = 0.7 (Section 2.2) by hyperparameter search on a 100-sample subset of the validation data. We use the data splits from Chen et al. (2020); Zhao et al. (2022b); Zhang et al. (2024), while excluding disconnected graphs following our assumption in Section 2.1. The task is transductive classification into 40 areas such as "cs.AI" using a set of labeled vertices.
Hardware Specification	Yes	We implement training and inference pipelines in Py Torch with NVIDIA GPUs and Intel Gaudi 2 compatibility. For example, in Section 5.3, each test prediction of RWNN-Llama-3-70b takes around 0.410 0.058 seconds on a 4 H100 machine (RWNN-Llama-3-8b is faster, 0.116 0.064 seconds per prediction).
Software Dependencies	No	The paper mentions software like "Py Torch", "De BERTa-base language model", "Adam W optimizer", "Llama 3", and "C++" but does not provide specific version numbers for these or any other libraries or frameworks.
Experiment Setup	Yes	We train all models with Adam optimizer with batch size 1 and learning rate 1e-4 for 500 epochs, closely following Bamberger et al. (2024) except for the lower learning rate which we found necessary for stabilizing training due to the stochasticity of walk based methods. For all RWNN variants, we perform 8 predictions per vertex during training, and average 32 predictions per vertex at test-time. Our RWNN uses the same design to Section 5.2, and we train them with L1 loss for 250k steps using batch size of 128 graphs for graph-level counting and 8 graphs for vertex-level. At test-time, we ensemble 64 and 32 predictions by averaging for graphand vertex-level tasks, respectively. We fine-tune it with cross-entropy loss for 100k steps using Adam W optimizer with 2e-5 learning rate and 0.01 weight decay following Kim et al. (2023). We truncate the input to 512 tokens due to memory constraints. We use batch size 256, and accumulate gradients for 8 steps for SR25.