reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimal Inference in Contextual Stochastic Block Models

Authors: O Duranthon, Lenka Zdeborova

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that there can be a considerable gap between the accuracy reached by this algorithm and the performance of the GNN architectures proposed in the literature. This suggests that the CSBM, along with the comparison to the performance of the optimal algorithm, readily accessible via our implementation, can be instrumental in the development of more performant GNN architectures.
Researcher Affiliation	Academia	O. Duranthon and L. Zdeborová Statistical Physics of Computation laboratory, École polytechnique fédérale de Lausanne (EPFL), Switzerland EMAIL
Pseudocode	Yes	Algorithm 1: The AMP BP algorithm. Algorithm 2: The AMP AMP algorithm. Algorithm 3: The multi-community AMP BP algorithm.
Open Source Code	Yes	We provide a simple-to-use implementation of the algorithm (attached in the Supplementary Material) and in our repository.1 1gitlab.epfl.ch/spoc-idephics/csbm
Open Datasets	No	Our motivation to study the CSBM is due to the interest this model has recently received in the community developing and analyzing graph neural networks (GNNs). Indeed, this model provides an idealized synthetic dataset on which graph neural networks can be conveniently evaluated and benchmarked.
Dataset Splits	Yes	We deﬁne Ξ as the set of revealed training nodes, that are observed. We set ρ = \|Ξ\|/N; ρ = 0 for unsupervised learning. We assume Ξ is drawn independently with respect to the group memberships. We deﬁne PU,i an additional node-dependant prior. It is used to inject information about the memberships of the observed nodes: We consider the same task as before: on a single instance of CSBM a fraction ρ of node labels are revealed and the GNN must guess the hidden labels.
Hardware Specification	Yes	Computationally, running the code for a single experiment, N = 3 × 10^4, α = 1 and d = 5 takes around one minute on one CPU core.
Software Dependencies	No	We use the architectures implemented by the Graph Gym package (You et al., 2020).
Experiment Setup	Yes	Intra-layer parameters: we take the internal dimension h = 64; we use batch normalization; no dropout; PReLU activation; add aggregation; convolution operation in {generalconv, gcnconv, gatconv} (as deﬁned in the conﬁg.py ﬁle). Inter-layer design: we take K ∈ {1, 2, 3, 4} layers of message-passing; no pre-process layer; one postprocess layer; stack connection. Training conﬁguration: The batch size is one, we train on the entire graph, revealing a proportion ρ of labels; the learning rate is 3 × 10^−3; we train for forty epochs with Adam; weight decay is 5 × 10^−4.