reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Improving Self-supervised Molecular Representation Learning using Persistent Homology

Authors: Yuankai Luo, Lei Shi, Veronika Thost

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We rigorously evaluate our approach for molecular property prediction and demonstrate its particular features in improving the embedding space: after SSL, the representations are better and offer considerably more predictive power than the baselines over different probing tasks; our loss increases baseline performance, sometimes largely; and we often obtain substantial improvements over very small datasets, a common scenario in practice.
Researcher Affiliation	Collaboration	Yuankai Luo Beihang University EMAIL Lei Shi Beihang University EMAIL Veronika Thost MIT-IBM Watson AI Lab, IBM Research EMAIL
Pseudocode	No	No explicit pseudocode or algorithm blocks are present in the paper.
Open Source Code	Yes	Our implementation is available at https://github.com/LUOyk1999/Molecular-homology.
Open Datasets	Yes	For pre-training, we considered the most common dataset following [Hu* et al., 2020], 2 million unlabeled molecules sampled from the ZINC15 database [Sterling and Irwin, 2015]. For downstream evaluation, we focus on the Molecule Net benchmark [Wu et al., 2018a] here, the appendix contains experiments on several other datasets.
Dataset Splits	Yes	Finally, scaffold-split [Ramsundar et al., 2019] is used to splits graphs into train/val/test set as 80%/10%/10% which mimics real-world use cases.
Hardware Specification	Yes	The experiments are conducted with two RTX 3090 GPUs.
Software Dependencies	No	The paper mentions using "Graph Isomorphism Network (GIN)" but does not specify software dependencies with version numbers (e.g., Python, PyTorch, or other libraries).
Experiment Setup	Yes	During the pre-training stage, GNNs are pre-trained for 100 epochs with batch-size as 256 and the learning rate as 0.001. During the fine-tuning stage, we train for 100 epochs with batch-size as 32, dropout rate as 0.5, and report the test performance using ROC-AUC at the best validation epoch.