reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multivariate Conformal Selection

Authors: Tian Bai, Yue Zhao, Xiang Yu, Archer Y. Yang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on simulated and real-world datasets demonstrate that m CS significantly improves selection power while maintaining FDR control, establishing it as a robust framework for multivariate selection tasks. 5. Simulation Studies 6. Real Data Application
Researcher Affiliation	Collaboration	1Department of Mathematics and Statistics, Mc Gill University, Montreal, Canada 2Department of Mathematics, University of York, York, UK 3MRL, Merck & Co., Inc., Rahway, NJ, USA 4Mila Quebec AI Institute, Montreal, Quebec, Canada.
Pseudocode	Yes	Algorithm 1 m CS: Multivariate Conformal Selection Algorithm 2 m CS-learn Learning Procedure
Open Source Code	Yes	The code for reproduction can be found at https://github.com/Tian-Bai/mcs.
Open Datasets	Yes	We employ an imputed public ADMET dataset compiled from multiple sources (Wenzel et al., 2019; Iwata et al., 2022; Kim et al., 2023; Watanabe et al., 2018; Falc on-Cano et al., 2022; Esposito et al., 2020; Braga et al., 2015; Aliagas et al., 2022; Perryman et al., 2020; Meng et al., 2022; Vermeire et al., 2022), comprising n = 22805 compounds with d = 15 biological assay responses. [...] The processed dataset contains n = 22805 data points, and can be found at https://github.com/Tian-Bai/mcs.
Dataset Splits	Yes	Specifically, we partition the calibration data into three batches Dcal = Df-train Df-val D cal, where Df-train and Df-val are used for training and validating fθ, respectively. [...] The calibration data is split to Df-train, Df-val and D cal with ratio 8:1:1, and the model fθ is formulated as a two-layer MLP with batch normalization. [...] We train the model using ntrain = 12000 samples, provide ncal = 8000 samples for calibration and reserves the remaining data of size ntest = 2805 as test data.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments. It mentions training a support vector regression model and using a Deep Purpose Python package, but provides no details on the specific hardware (e.g., GPU/CPU models, cloud resources) used.
Software Dependencies	No	We employed Chemprop (Yang et al., 2019; Heid et al., 2023) to impute these entries. The resulting imputed dataset was then used in all subsequent experiments. [...] the underlying predictor ˆµ is specified as a drug property prediction model from the Deep Purpose Python package (Huang et al., 2020) with Morgan drug encoding. The paper mentions software like "Chemprop" and "Deep Purpose Python package" but does not provide specific version numbers for them.
Experiment Setup	Yes	We first train a support vector regression model ˆµ using 1000 data points, and use an additional labeled dataset of 1000 samples to construct selection sets for different methods in comparison. [...] The response dimension is set to be d = 30, and nominal FDR level is set at q = 0.3. Number of iterations for validation is set to K = 100. [...] For m CS-learn, the calibration data is split to Df-train, Df-val and D cal with ratio 8:1:1, and the model fθ is formulated as a two-layer MLP with batch normalization. [...] We adopt the clipped score (8) for m CS-dist, and adopt the loss function in (16) with balancing coefficient γ = 0.5 for m CS-learn. [...] For the second task, the target region is defined as a sphere {y : y c 2 r}. For convenience, we take the center of the sphere the same as the cutoffs ck in task 1, and let r = 2.4.