reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Understanding Generalization in Quantum Machine Learning with Margins

Authors: Tak Hur, Daniel K. Park

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental studies on the quantum phase recognition dataset demonstrate that margin-based metrics are strong predictors of generalization performance, outperforming traditional metrics like parameter count. By connecting this margin-based metric to quantum information theory, we demonstrate how to enhance the generalization performance of QML through a classical-quantum hybrid approach when applied to classical data.
Researcher Affiliation	Academia	1Department of Statistics and Data Science, Yonsei University, Seoul, Republic of Korea 2Department of Applied Statistics, Yonsei University, Seoul, Republic of Korea. Correspondence to: Tak Hur <EMAIL>, Daniel K. Park <EMAIL>.
Pseudocode	No	The paper describes methods through mathematical formulations and textual descriptions of processes and algorithms, but it does not contain any explicitly labeled pseudocode blocks or algorithms with structured steps.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository. It mentions using third-party tools like PennyLane but not its own implementation code.
Open Datasets	Yes	Figure 4 presents the classification of MNIST (Le Cun et al., 2010), Fashion-MNIST (Xiao et al., 2017), and Kuzushiji-MNIST (Clanuwat et al., 2018) datasets using 8-qubit QCNN with various quantum embedding schemes.
Dataset Splits	Yes	The model was trained on 20 data points, evenly split across four classes. A small training sample was deliberately chosen to explore the overfitting regime, where labels were intentionally randomized with noise, following a methodology similar to that of Gil-Fuster et al. (2024). The test accuracy was measured on 1,000 test samples far exceeding the size of the training set to provide a robust estimate of true accuracy. ... For the fixed embedding, we used the ZZ Feature Map with three repeated layers. ... Unlike previous experiments, we did not examine the overfitting regime under label noise, choosing instead to utilize the full training and test datasets.
Hardware Specification	No	The paper does not explicitly specify the hardware (e.g., GPU, CPU models, or cloud computing resources) used to run the experiments.
Software Dependencies	No	The paper mentions software like the Adam optimizer and PennyLane, but it does not provide specific version numbers for these or any other key software dependencies required for replication.
Experiment Setup	Yes	The model was trained using Adam optimizer, with a learning rate of 0.001 and full-batch updates (Kinga et al., 2015). The model was trained for up to 5,000 iterations, with early stopping triggered based on a convergence interval of 500 iterations. Specifically, the training halted when the difference between the average loss over two consecutive intervals became smaller than the standard deviation of the most recent interval. ... The experimental setup remained the same as before, except for using a batch size of 16 instead of full-batch training.