reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Graph Structure from Convolutional Mixtures

Authors: Max Wasserman, Saurabh Sihag, Gonzalo Mateos, Alejandro Ribeiro

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present comprehensive experiments on link prediction and edge-weight regression tasks using synthetic data (Section 5.1) as well as real HCP-YA neuroimaging and social network data (Section 5.2).
Researcher Affiliation	Academia	Max Wasserman EMAIL Department of Computer Science University of Rochester; Saurabh Sihag EMAIL Department of Electrical and Systems Engineering University of Pennsylvania; Gonzalo Mateos EMAIL Department of Electrical and Computer Engineering University of Rochester; Alejandro Ribeiro EMAIL Department of Electrical and Systems Engineering University of Pennsylvania
Pseudocode	No	The paper describes the methodology through conceptual iterations and mathematical formulations (e.g., equations (3), (4), (6), (11), (12), (16), (17), (18)) and narrative text, but does not include a distinct pseudocode or algorithm block.
Open Source Code	Yes	Code is available on github at https://github.com/maxwass/py GSL.
Open Datasets	Yes	Raw data available from https://www.humanconnectome.org/study/hcp-young-adult/overview, and subject to the HCP Data Use Agreement. (Appendix A.6, page 15); The Thiers13 dataset (Génois & Barrat, 2018) monitored high school students, recording: (i) physical co-location interactions with wearable sensors over 5 days; and (ii) social network information via survey. (Section 5.2, page 8)
Dataset Splits	Yes	For the sizes of the training/validation/test splits, the pseudo-synthetic domain uses 913/50/100 and the synthetic domains use 913/500/500. (Section 5.1, page 5); From this data, we extracted a dataset of 1063 FC-SC pairs, T = {F C(i), SC(i)}1063 i=1 and use a training/validation/test split of 913/50/100. (Section 5.2, page 8); We trained a GDN of depth D = 11 without MIMO ﬁlters (C = 1), with learned A[0] using a training/validation/test split of 5000/1000/1000. (Section 5.2, page 8)
Hardware Specification	Yes	We use one Nvidia T4 GPU; models take < 15 minutes to train on all datasets. (Section 5, page 5); GDN, L2G, and GLAD report time (s) for a forward and backward pass on a single sample (batch size = 32), as well as peak GPU memory consumption, using a T4 GPU on a g4dn.xlarge AWS instance. Spec Temp reports inference time (s) for a single sample, and peak memory consumption, using a 96 Core x86 Processor on a c5.metal AWS instance. (Table 3, Appendix A.8, page 17)
Software Dependencies	No	We rely on Py Torch (Paszke et al., 2019) (BSD license) heavily and use Conda (Anaconda, 2020) (BSD license) to make our system as hardwareindependent as possible. (Appendix A.9, page 17) This text mentions software tools but does not provide specific version numbers for them.
Experiment Setup	Yes	Unless otherwise stated, in all the results that follow we use GDN(-S) models with D = 8 layers, C = 8 channels per layer, take prior A[0] = 0 on all domains except the SCs where we use the sample mean of all SCs in the training set, and train using the ADAM optimizer (Kingma & Ba, 2015) with learning rate of 0.01 and batch size of 200. (Section 5, page 5); Optimal values are learning rate = 1e 2, D = 8, C = 8, and AO, A[k] normalization s of maximum eigenvalue, and maximum absolute value norm, respectively. (Appendix A.8, page 18)