Curriculum-aware Training for Discriminating Molecular Property Prediction Models

Authors: Hansi Yang, Quanming Yao, James Kwok

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluation on various molecular property prediction datasets validate the effectiveness of our approach. ... Empirical results on various molecular property data sets demonstrate effectiveness of the proposed method. ... In this section, we demonstrate the performance of the proposed method on both classification data sets (Section 5.1) popularly used in existing works (Stärk et al., 2022; Wang et al., 2023b; Zhou et al., 2023) and regression data sets (Section 5.2) that are more common in real-world application (van Tilborg et al., 2022). Section 5.3 presents ablation studies to verify the effectiveness of each component in the proposed method. The effect of the hyper-parameters that define R(t) are studied in Section 5.4. Section 5.5 further visualizes the loss distribution on molecules. Section 5.6 presents case studies to better understand the proposed method.
Researcher Affiliation Academia Hansi Yang Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong, China EMAIL Quanming Yao Department of Electronic Engineering, State Key Laboratory of Space Network and Communications, Tsinghua University Beijing, China EMAIL James Kwok Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong, China EMAIL
Pseudocode Yes Algorithm 1 Learning with Activity Cliff (LAC). 1: Initialize prediction model f with parameter w (random initialization or pre-trained weights); 2: for t = 0, . . . , T 1 do 3: Draw a mini-batch B from molecule data set D; 4: Obtain the set A of molecule pairs in B with activity cliff; 5: Determine R(t); 6: Select R(t) |B| large-loss samples ˆB from B based on network f s predictions; 7: Select R(t) |A| pairs of molecule ˆ A with activity pairs and compute Le in (4); 8: Update w = w η w(L(w; ˆB) + αLe(w; ˆ A)); 9: end for
Open Source Code No The paper does not provide a specific repository link, an explicit statement of code release, or mention code in supplementary materials for their methodology. It mentions using existing models like GIN, Graph GPS, 3D-PGT, Uni Mol, and MLP(ECFP), but not releasing their own implementation of LAC.
Open Datasets Yes We consider four tasks from the Tox21 data set (Wu et al., 2018) which predict a molecule s response to different receptors (NR-Ah R, NR-ER, SR-ARE and SR-MMP). ... We perform experiments on eight classification tasks from the Molecule Net (Wu et al., 2018): Tox21, Tox Cast, Sider, MUV, Bace, BBBP, Clin Tox and HIV. ... we select five data sets from the Ch EMBL database (Zdrazil et al., 2023), which describe the (continuous) bioactivity values of molecules to a specific target.
Dataset Splits Yes For the classification experiments, the data splits of all data sets in our experiments follow the scaffold split in (Wang et al., 2023b). For the regression experiments, the data splits of all data sets in our experiments are the same as in (van Tilborg et al., 2022), and we use an three-layer MLP model with input dimension 1024 and hidden dimension 512 for all hidden layers.
Hardware Specification Yes All experiments are run on a single NVIDIA RTX A6000 GPU.
Software Dependencies No The paper mentions using the Adam optimizer (Kingma & Ba, 2015), but does not specify version numbers for any programming languages (e.g., Python), libraries (e.g., PyTorch, TensorFlow), or other software components used for implementation.
Experiment Setup Yes All experiments are run on a single NVIDIA RTX A6000 GPU. For all experiments in this work, we use the Adam optimizer (Kingma & Ba, 2015), and follow its default hyper-parameters: learning rate η is set 0.001, first-order momentum weight β1 is set to 0.9, and the second-order momentum weight β2 is set to 0.99. The batch size is set to 256 for all data sets. Unless otherwise specified, we set the R(t) schedule as R(t) = λ min(t/(γT), 1) with λ = 0.2 and γ = 0.1, and the weight α for pairwise loss Le is set to 0.1.