reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

GPS++: Reviving the Art of Message Passing for Molecular Property Prediction

Authors: Dominic Masters, Josef Dean, Kerstin Klaeser, Zhiyi Li, Samuel Maddrell-Mander, Adam Sanders, Hatem Helal, Deniz Beker, Andrew W Fitzgibbon, Shenyang Huang, Ladislav Rampášek, Dominique Beaini

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our model integrates a well-tuned local message passing component and biased global attention with other key ideas from prior literature to achieve state-of-the-art results on large-scale molecular dataset PCQM4Mv2. Through a thorough ablation study we highlight the impact of individual components and find that nearly all of the model s performance can be maintained without any use of global self-attention. We also find that our approach is significantly more accurate than prior art when 3D positional information is not available. In Table 2, we compare the single model performance of GPS++ with results from the literature
Researcher Affiliation	Collaboration	1Graphcore 2Valence 3Mila Québec AI Institute 4Mc Gill University 5Université de Montréal
Pseudocode	No	The paper describes the GPS++ block and MPNN module using mathematical equations (e.g., equations 1-11 and Appendix A.1) and diagrams (Figure 2), but does not present a dedicated section or figure explicitly labeled as 'Pseudocode' or 'Algorithm' with structured, code-like steps.
Open Source Code	Yes	Reproducibility: Source code to reproduce our results can be found at: https://github.com/graphcore/ogb-lsc-pcqm4mv2.
Open Datasets	Yes	The Pub Chem QC project (Nakata & Shimazaki, 2017) is one of the largest widely available DFT databases, and from it is derived the PCQM4Mv2 dataset, released as a part of the Open Graph Benchmark Large Scale Challenge (OGB-LSC) (Hu et al., 2021), which has served as a popular testbed for development and benchmarking of novel graph neural networks (GNNs). ... In order to test GPS++ in the presence of 3D positional data during test time and to investigate the generalisability of the model, we fine-tune GPS++ on 8 different tasks from the the quantum chemistry benchmark QM9 (Ruddigkeit et al., 2012; Ramakrishnan et al., 2014).
Dataset Splits	Yes	The 3.7M molecules are separated into standardised sets, namely into training (90%), validation (2%), test-dev (4%) and test-challenge (4%) sets using a scaffold split where the HOMO-LUMO gap targets are only publicly available for the training and validation splits. ... QM9 does not provide a standardised dataset split, we therefore follow several previous works (Luo et al., 2022; Thölke & De Fabritiis, 2022a) and randomly select 10,000 molecules for validation and 10,831 for testing; all remaining molecules are used during training.
Hardware Specification	Yes	We train our models using a BOW-POD16 which contains 16 IPU processors, delivering a total of 5.6 peta FLOPS of float16 compute and 14.4 GB of in-processor SRAM which is accessible at an aggregate bandwidth of over a petabyte per second.
Software Dependencies	No	The paper mentions software like PyTorch Geometric and RDKit, and optimizers like Adam, but does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	Our model training setup uses the Adam optimiser (Kingma & Ba, 2015) with a gradient clipping value of 5, a peak learning rate of 4e-4 and the model is trained for a total of 450 epochs. We used a learning rate warmup period of 10 epochs followed by a linear decay schedule. The regression loss is the mean absolute error (L1 loss) between a scalar prediction and the ground truth HOMO-LUMO gap value. ... We set pcorrupt = 0.01 and weight the cross-entropy losses such that they have a ratio 1:1.2:1.2 for losses HOMO-LUMO:Noisy Nodes:Noisy Edges. ... Average batch size is kept constant (926 nodes per batch) for all runs to keep them directly comparable.