reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Joker: Joint Optimization Framework for Lightweight Kernel Machines

Authors: Junhong Zhang, Zhihui Lai

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that Joker saves up to 90% memory but achieves comparable training time and performance (or even better) than the state-of-the-art methods. Table 4 shows the result of the performance comparison.
Researcher Affiliation	Academia	1College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, China 2Guangdong Provincial Key Laboratory of Intelligent Information Processing, Shenzhen, 518060, China. Correspondence to: Zhihui Lai <lai zhi EMAIL>.
Pseudocode	Yes	Algorithm 1: Trust region (twice-differentialable f), Algorithm 2: Truncated CG-Steihaug, Algorithm 3: DBCD-TR for problem (4)
Open Source Code	Yes	The implementation2 of Joker is based on PyTorch without extra acceleration libraries. 2Code available at GitHub: https://github.com/Apple-Zhang/Joker-paper.
Open Datasets	Yes	Million-song dataset (MSD) (Bertin-Mahieux et al., 2011). This dataset contains audio features for year prediction. Available at https://archive.ics.uci.edu/dataset/203/yearpredictionmsd. Household Electric Power Consumption (HEPC) dataset: It is the same as the House Elec dataset used in (Lin et al., 2024), available using the python package https://github.com/treforevans/uci_datasets. Supersymmetric particle classification (SUSY) dataset (Baldi et al., 2014): This is a binary classification task to distinguish between supersymmetric particles and background process. Available at https://archive.ics. uci.edu/dataset/279/susy. HIGGS dataset (Baldi et al., 2014): This is a binary classification task to distinguish between the Higgs boson and the background process. Available at https://archive.ics.uci.edu/dataset/280/higgs. CIFAR-5M dataset (Nakkiran et al., 2021): This is a generated dataset based on CIFAR-10. Available at https: //github.com/preetum/cifar5m.
Dataset Splits	Yes	Table A.1. Summary of datasets used in experiments Data Split ... MSD 90% train, 10% test ... HEPC 90% train, 10% test ... SUSY 80% train, 20% test ... HIGGS 80% train, 20% test ... CIFAR-5M 80% train, 20% test
Hardware Specification	Yes	We implemented KRR, KLR, SVM, etc. based on Joker, and conducted experiments with a single RTX 3080 (10GB). To highlight that Joker can obtain promising performance under a limited computational budget, we conduct experiments on a machine with a single consumer GPU (NVIDIA RTX 3080, 10GB) and 64GB RAM.
Software Dependencies	No	The implementation2 of Joker is based on PyTorch without extra acceleration libraries. Explanation: The paper mentions PyTorch as a dependency but does not specify its version number or versions for any other key software components, which is required for reproducibility.
Experiment Setup	Yes	The regularization parameter λ in Joker is tuned from {2i : i = 7, 6, ..., 7} via grid search. ... In most cases, we employ the inexact Joker models with the block size \|B\| = 512 ... We also increase the block size to 1024 for Joker-KLR to accelerate convergence. ... Details of further parameter settings are shown in Appendix C. ... Table A.3. The major hyperparameters of the models in the experiments.