reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Efficient Positional Encodings with Graph Neural Networks

Authors: Charilaos Kanatsoulis, Evelyn Choi, Stefanie Jegelka, Jure Leskovec, Alejandro Ribeiro

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental evaluations show that PEARL outperforms lightweight versions of eigenvector-based PEs and achieves comparable performance to full eigenvector-based PEs, but with one or two orders of magnitude lower complexity. Our code is available at https://github.com/ehejin/Pearl-PE. 6 EXPERIMENTS In this section, we assess the performance of PEARL on graph classification, graph regression and recommendation tasks.
Researcher Affiliation	Academia	Charilaos I. Kanatsoulis1 , Evelyn Choi1, Stephanie Jegelka2,3, Jure Leskovec1 , Alejandro Ribeiro4 1Stanford University 2MIT 3Technical University of Munich 4 University of Pennsylvania
Pseudocode	No	The paper describes the methodology using mathematical equations and textual descriptions but does not contain explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Our code is available at https://github.com/ehejin/Pearl-PE.
Open Datasets	Yes	We first evaluate our architecture on graph classification tasks using the REDDIT-B (2,000 graphs, 2 classes, 429.6 average nodes) and REDDIT-M (5,000 graphs, 5 classes, 508.5 average nodes) datasets (Yanardag & Vishwanathan, 2015). ... We also evaluate our model on the task of predicting the penalized water-octanol partition coefficient (log P) for molecules from the ZINC dataset (Irwin et al., 2012; Dwivedi et al., 2023). ... We conduct experiments on the Drug OOD dataset (Ji et al., 2022). ... we utilize the rel-stack dataset for the relational deep learning benchmark (Rel Bench) (Fey et al.; Robinson et al., 2024). ... We conduct experiments on the Circular Skip Link (CSL) dataset (Murphy et al., 2019)... We conduct experiments on the Peptides-struct dataset from the Long Range Graph Benchmark dataset (Dwivedi et al., 2022).
Dataset Splits	Yes	To train the GNN models, we conduct 10-fold cross-validation. ... We use the standard split for this dataset, which entails 10,000 molecules for training, 1,000 for validation, and another 1,000 for testing. ... The dataset evaluates models on out-of-distribution (OOD) generalization, focusing on three specific types of domain shifts: Assay, Scaffold, and Size.
Hardware Specification	Yes	All experiments were conducted on a Linux server with NVIDIA A100 GPU.
Software Dependencies	Yes	For our experiments and model training pipeline we follow the codebases of (Huang et al.) and (Lim et al.), using Python, Py Torch (Paszke et al., 2019), and the Py Torch Geometric (Fey & Lenssen, 2019) libraries.
Experiment Setup	Yes	To generate the proposed PE, Φ is an L layer message-passing GNN with batch normalization layers and skip connections, where L {7, 8, 9}. ... In all experiments we evaluated our model on selected values of K ranging from 2 to 18, as well as different sample sizes ranging from 10 to 200, and selected the best model accordingly. ... We report the best performance observed during 350 epochs of training... For both models, we train with a batch size of 128 for REDDIT-BINARY and REDDIT-MULTI. ... We report the mean and standard deviation of the MAE for the model achieving the highest validation accuracy, averaged over 4 different seeds. ... For R-PEARL we use 50-120 samples and K = 12, while for B-PEARL we use K = 4. ... Training is conducted with a batch size of 20. ... within a 500k parameter budget.