reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Strong Baseline for Molecular Few-Shot Learning

Authors: Philippe Formont, Hugo Jeannin, Pablo Piantanida, Ismail Ben Ayed

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our methods on the FS-mol benchmark (Stanley et al., 2021). ... Table 1 presents the results of the different methods on the FS-mol test set. ... Figure 2: Evolution of the model s performance on the validation set during the few-shot adaptation.
Researcher Affiliation	Academia	Philippe Formont International Laboratory on Learning Systems, Montreal, Canada Université Paris-Saclay, Ecole de technologie superieur, MILA Hugo Jeannin International Laboratory on Learning Systems, Montreal, Canada Ecole de technologie superieur Pablo Piantanida International Laboratory on Learning Systems, Montreal, Canada Université Paris-Saclay, MILA Ismail Ben Ayed International Laboratory on Learning Systems, Montreal, Canada Ecole de technologie superieur
Pseudocode	Yes	A.1 Algorithm Procedure 1 Training of the quadratic probe Input: A support set S = {zi, yi}i N, the model s weights {wk}k {0,1}, a learning rate α and a shrinkage coefficient λ. for i {1, . . . , Nepochs} do for k {0, 1} do Sk {zi\|yi = k}i N Mk (1 λ) 1 \|Sk\| P z Sk (z wk) (z wk)T + λI 1 {Compute the covariance matrices} end for Θ {wk, Mk}k {0,1} Lce(Θ) 1 \|S\| P z,y S Lce(z, y, Θ) for k {0, 1} do wk wk α wk Lce(Θ) {Update the weights with gradient descent} end for end for Return Θ
Open Source Code	Yes	The anonymized codebase used in this paper is available at https://github.com/Fransou/ Strong-Baseline-Molecular-FSL
Open Datasets	Yes	We evaluate our methods on the FS-mol benchmark (Stanley et al., 2021). ... We designed scenarios extracted from the therapeutic data commons platform (Huang et al., 2021) and the LIT-PCBA dataset (Tran-Nguyen et al., 2020)
Dataset Splits	Yes	Finally, for a task t, Dt test is divided into two sets: St, Qt (St Qt = ) defining a set of labelled data that we will use to adapt the model, i.e., support set {xi, yi}i St, and an unlabelled set whose labels are to be predicted, i.e., query set {xi, yi}i Qt. ... We present the results for support set sizes of 16, 32, 64 and 128 (larger support set sizes would discard some tasks from the test set).
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) are mentioned in the paper.
Software Dependencies	No	The paper mentions RDKit but does not provide specific version numbers for it or any other software dependencies. While a codebase is provided, specific versions of libraries are not listed in the text.
Experiment Setup	Yes	Hyper-parameters are selected through a hyper-parameter search on the validation set across all support set sizes. All hyperparameters are kept constant over each support set size, except for the optimal numbers of epochs for our fine-tuning baselines, which are fitted for each support set size on the validation set. ... We chose similarly to use a single value of λ = 0.2 across all support set sizes.