Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Improving Self-supervised Molecular Representation Learning using Persistent Homology

Authors: Yuankai Luo, Lei Shi, Veronika Thost

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We rigorously evaluate our approach for molecular property prediction and demonstrate its particular features in improving the embedding space: after SSL, the representations are better and offer considerably more predictive power than the baselines over different probing tasks; our loss increases baseline performance, sometimes largely; and we often obtain substantial improvements over very small datasets, a common scenario in practice.
Researcher Affiliation Collaboration Yuankai Luo Beihang University EMAIL Lei Shi Beihang University EMAIL Veronika Thost MIT-IBM Watson AI Lab, IBM Research EMAIL
Pseudocode No No explicit pseudocode or algorithm blocks are present in the paper.
Open Source Code Yes Our implementation is available at https://github.com/LUOyk1999/Molecular-homology.
Open Datasets Yes For pre-training, we considered the most common dataset following [Hu* et al., 2020], 2 million unlabeled molecules sampled from the ZINC15 database [Sterling and Irwin, 2015]. For downstream evaluation, we focus on the Molecule Net benchmark [Wu et al., 2018a] here, the appendix contains experiments on several other datasets.
Dataset Splits Yes Finally, scaffold-split [Ramsundar et al., 2019] is used to splits graphs into train/val/test set as 80%/10%/10% which mimics real-world use cases.
Hardware Specification Yes The experiments are conducted with two RTX 3090 GPUs.
Software Dependencies No The paper mentions using "Graph Isomorphism Network (GIN)" but does not specify software dependencies with version numbers (e.g., Python, PyTorch, or other libraries).
Experiment Setup Yes During the pre-training stage, GNNs are pre-trained for 100 epochs with batch-size as 256 and the learning rate as 0.001. During the fine-tuning stage, we train for 100 epochs with batch-size as 32, dropout rate as 0.5, and report the test performance using ROC-AUC at the best validation epoch.