M^2LLM: Multi-view Molecular Representation Learning with Large Language Models

Authors: Jiaxin Ju, Yizhen Zheng, Huan Yee Koh, Can Wang, Shirui Pan

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiment 4.1 Experimental Setup Dataset Our framework is evaluated on 8 datasets spanning 34 tasks from Molecule Net [Wu et al., 2018], including physiology-related tasks like BBBP [Martins et al., 2012], Clin Tox [Gayvert et al., 2016], and 27 SIDER tasks [Kuhn et al., 2016] for adverse drug reaction prediction. Additionally, we evaluate classification tasks from BACE [Subramanian et al., 2016] and HIV [Wu et al., 2018], as well as regression tasks from ESOL [Delaney, 2004], Free Solv [Mobley and Guthrie, 2014], and Lipophilicity [Wu et al., 2018]. ... 4.2 Performance on Classification Tasks We evaluate M 2LLM on five classification datasets with 31 subtasks, as shown in Table 1. We report the mean and standard deviation from 10 random seeds using the evaluation metric, receiver operating characteristic-area under the curve (ROC-AUC) (%), where higher scores indicate better performance. ... 4.3 Performance on Regression Tasks We evaluate M 2LLM on three regression tasks, as shown in Table 2. We report the RMSE for regression, where lower values signify better result.
Researcher Affiliation Academia Jiaxin Ju1 , Yizhen Zheng2 , Huan Yee Koh2,3 , Can Wang1 and Shirui Pan1 1School of Information and Communication Technology, Griffith University 2Department of Data Science and AI, Monash University 3Drug Discovery Biology, Monash Institute of Pharmaceutical Sciences, Monash University EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the M2LLM framework components and their interactions using narrative text and diagrams (Figure 1 and Figure 2), but does not present any formal pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes Our framework is evaluated on 8 datasets spanning 34 tasks from Molecule Net [Wu et al., 2018], including physiology-related tasks like BBBP [Martins et al., 2012], Clin Tox [Gayvert et al., 2016], and 27 SIDER tasks [Kuhn et al., 2016] for adverse drug reaction prediction. Additionally, we evaluate classification tasks from BACE [Subramanian et al., 2016] and HIV [Wu et al., 2018], as well as regression tasks from ESOL [Delaney, 2004], Free Solv [Mobley and Guthrie, 2014], and Lipophilicity [Wu et al., 2018].
Dataset Splits Yes We use the scaffold splitting method recommended by Molecule Net [Wu et al., 2018], which assigns molecules with distinct structural scaffolds to separate training, validation, and test sets. This method, unlike random splitting, ensures structural dissimilarity between sets, creating a more challenging evaluation scenario.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments. It mentions using various LLM models (Galactica, LLaMa-3.1, Open AI's models) but not the underlying hardware for the M2LLM framework's experiments.
Software Dependencies No The paper mentions using specific LLM models such as 'Galactica models (6.7B and 30B parameters) [Taylor et al., 2022]', 'LLa Ma-3.1 models (8B and 8B-instruct) [Dubey et al., 2024]', and 'Open AI s closed-source text embedding models (small and large configurations) [Open AI, 2024]'. However, it does not provide specific version numbers for these or other ancillary software libraries or programming languages required for reproduction.
Experiment Setup No The paper describes the overall framework, including the multi-layer perceptron (MLP) for prediction and the loss functions used (cross-entropy for classification, RMSE for regression). It also states that 'The framework is trained to optimize the weights αstruct i , αtask i , and αrule i , and the MLP parameters using task-specific loss functions.' However, it does not provide concrete hyperparameter values for the MLP (e.g., number of layers, hidden dimensions, learning rate, batch size, number of epochs) or other system-level training settings for the M2LLM framework.