reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Set Functions with Implicit Differentiation

Authors: Gözde Özcan, Chengzhi Shi, Stratis Ioannidis

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate the efficiency of our method on synthetic and real-world subset selection applications including product recommendation, set anomaly detection and compound selection tasks. We evaluate our proposed method on five datasets including set anomaly detection, product recommendation, and compound selection tasks (see Tab. 1 and App. I of Ozcan, Shi, and Ioannidis (2024) for a datasets summary and for detailed dataset descriptions).
Researcher Affiliation	Academia	G ozde Ozcan, Chengzhi Shi, Stratis Ioannidis Northeastern University, Boston, MA 02115, USA EMAIL
Pseudocode	Yes	Algorithm 1: Diff MF (Ou et al. 2022) ... Algorithm 2: i Diff MF
Open Source Code	Yes	We include our code in the supplementary material and will make it public after the review process.
Open Datasets	Yes	The Gaussian and Moons are synthetic datasets, while the rest are real-world datasets. We use the CelebA (Liu et al. 2015) dataset for set anomaly detection. The Amazon Product Recommendation (PR) dataset consists of product review data from Amazon.com (Ni et al. 2019). BindingDB (Gilson et al. 2016) is a public, medicinal chemistry-oriented database that contains binding affinities of proteins with small drug-like molecules.
Dataset Splits	Yes	More specifically, we partition each dataset to a training set and a hold out/test set (see Tab. 1 of Ozcan, Shi, and Ioannidis (2024) for split ratios). We then divide the training dataset in 5 folds.
Hardware Specification	No	During training, we track the amount of memory used every 5 seconds with the nvidia-smi command while varying the number of maximum iterations. For each number of maximum iterations, we report the minimum, maximum, and average memory usage.
Software Dependencies	No	We use the PyTorch code repository provided by Ou et al. (2022) for all three competitor algorithms. We use the JAX+Flax framework (Bradbury et al. 2018; Frostig, Johnson, and Leary 2018; Heek et al. 2023) for its functional programming abilities for our i Diff MF implementations. In particular, we implement implicit differentiation using the JAXopt library (Blondel et al. 2022).
Experiment Setup	Yes	As per Ou et al., we set the number of iterations to K = 5 for all datasets. As per Ou et al. (2022), we set K = 1 for all datasets. We explore the following hyper-parameters: learning rate η, number of layers L, and different forward and backward solvers. Additional details, including ranges and optimal hyperparameter combinations, can be found in App. I.7 of Ozcan, Shi, and Ioannidis (2024).