Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Self-Supervised Learning on Molecular Graphs: A Systematic Investigation of Masking Design

Authors: Jiannan Yang, Veronika Thost, Tengfei Ma

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This work cast the entire pretrain finetune workflow into a unified probabilistic framework, enabling a transparent comparison and deeper understanding of masking strategies. Building on this formalism, we conduct a controlled study of three core design dimensions: masking distribution, prediction target, and encoder architecture, under rigorously controlled settings. We further employ information-theoretic measures to assess the informativeness of pretraining signals and connect them to empirically benchmarked downstream performance. Our findings reveal a surprising insight: sophisticated masking distributions offer no consistent benefit over uniform sampling for common node-level prediction tasks.
Researcher Affiliation Collaboration Jiannan Yang EMAIL Stony Brook University Veronika Thost EMAIL MIT-IBM Watson AI Lab Tengfei Ma EMAIL Stony Brook University
Pseudocode Yes Algorithm 1: Perturbed Page Rank-Based Masking (Struct MAE-P) ... Algorithm 2: Perturbed Learnable Masking (Struct MAE-L)
Open Source Code No No explicit statement or link providing concrete access to source code for the methodology described in this paper was found. The mention of 'Reviewed on Open Review: https: // openreview. net/ forum? id= TE4vc YWRcc' refers to the review process, not code availability.
Open Datasets Yes We adopt a standardized two-stage protocol: (1) self-supervised pretraining on 2M molecules sampled from ZINC15 (Sterling & Irwin, 2015; Hu et al., 2019), and (2) fine-tuning and evaluation on 11 Molecule Net benchmarks (Wu et al., 2018), with supplementary validation on curated datasets from Polaris (Wognum et al., 2024).
Dataset Splits Yes For downstream tasks, we attach a linear prediction head and fine-tune the encoder using scaffold-based 8:1:1 splits.
Hardware Specification Yes All models were pre-trained for 100 epochs on 2 million molecules from ZINC15 using a single NVIDIA A6000 GPU.
Software Dependencies No The paper mentions software components like GIN, Graph GPS, and Adam optimizer but does not provide specific version numbers for these or other ancillary software dependencies.
Experiment Setup Yes Pretraining. We primarily compare two encoder backbones: GIN and Graph GPS, both implemented with edge-aware GINE layers (Hu et al., 2019). The hidden dimension is fixed to 300, trained for 100 epochs with Adam optimizer. To ensure fairness, mask ratios follow prior work but are aligned where needed: 0.15 for Attr Mask-family baselines, 0.25 for Graph MAE/Struct MAE, and 0.30 (with 50% intra-motif atom masking) for Motif Pred. All other hyperparameters (batch size, dropout, learning rate) are summarized in Table 1. Table 1: Pretraining configuration of two backbone models. Component GIN Graph GPS Encoder layers 5 GIN layers 5 GPS blocks Hidden dimension 300 300 Dropout 0.0 0.0 Attention heads 8 Optimizer Adam Learning rate 1e-3 Batch size 256 256 Dropout rate 0.0 (GIN) 0.5 (Attn) Epochs 100 100. Fine-tuning... Table 3: Fine-tuning configuration across task types. Parameter Classification Regression Prediction head Linear layer (input dim = 300) Optimizer Adam Learning rate 1e-3 Epochs 100 100 Dropout rate 0.5 0.2 Batch size 32 256