reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learn Beneficial Noise as Graph Augmentation

Authors: Siqi Huang, Yanchen Xu, Hongyuan Zhang, Xuelong Li

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct experiments to evaluate our model by answering the following questions. Q1: Does our proposed Pi NGDA on graph outperform existing baseline methods? Q2: Is the π-noise we trained more useful than random noise? How does each component affect model performance? Q3: How does our method perform in terms of time and space efficiency? Q4: How does the π-noise look like? We begin with a brief introduction of experimental settings, followed by a detailed presentation of the experimental results and their analysis.
Researcher Affiliation	Collaboration	1School of Artificial Intelligence, OPtics and Electro Nics (i OPEN), Northwestern Polytechnical University 2Institute of Artificial Intelligence (Tele AI), China Telecom 3The University of Hong Kong.
Pseudocode	Yes	Algorithm 1 Pseudo code of Pi NGDA
Open Source Code	No	The paper does not provide an explicit statement about open-sourcing the code for the methodology described, nor does it include a link to a code repository.
Open Datasets	Yes	We use seven benchmark datasets for semi-supervised node classification, including Cora, Citeseer, Pubmed (Sen et al., 2008), Wiki-CS (Mernyei & Cangea, 2020), Amazon-Photo, Coauthor-CS (Shchur et al., 2019) and ogbn-arxiv (Hu et al., 2020). ... We evaluate our proposed framework in the semi-supervised learning setting on graph classification on the benchmark TUDataset (Morris et al., 2020).
Dataset Splits	Yes	For all datasets, we randomly split the datasets, where 10%, 10%, and the rest 80% of nodes are selected for the training, validation, and test set, respectively.
Hardware Specification	Yes	Our experiments are conducted on an NVIDIA 4090 GPU (24 GB memory) for most datasets and on an NVIDIA A100 GPU (40 GB memory) for OGB-arxiv.
Software Dependencies	No	The paper mentions general techniques and components like GCN, PRe LU, Re LU, Gumbel-Softmax, but does not specify any software libraries or frameworks with their version numbers.
Experiment Setup	Yes	For our proposed method, we employ a two-layer GCN network with PRe LU activation, where the hidden layer dimension is set to 512, and the final embedding dimension is 256. Additionally, we utilize a projection head, consisting of a 256-dimensional fully connected layer with Re LU activation, followed by a 256-dimensional linear layer. ... Some hyper-parameters of the experiment vary on different datasets, which is shown in Table 9. For the learnable noise generators, we use separate optimizers with learning rates of 0.0001 for edges and 0.001 for features, and apply a weight decay of 0.0001 to both. Specifically, we carry out grid search for the hyper-parameters on the following search space: Number of training epochs: {500, 1000, 1500, 2000, 3000}. Learning rate for training: {1e 2, 1e 3, 5e 4} Weight decay for training: {1e 3, 1e 4} Random Edge dropping rates pe1, pe2: {0.1, 0.2, 0.3, 0.4, 0.5, 0.6} Random Feature masking rates pf1, pf2: {0.1, 0.3, 0.5} Temperature τ: {0.3, 0.4, 0.5}