reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Rethink the Role of Deep Learning towards Large-scale Quantum Systems

Authors: Yusheng Zhao, Chi Zhang, Yuxuan Du

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To address this, we systematically benchmark DL models against traditional ML approaches across three families of Hamiltonian, scaling up to 127 qubits in three crucial groundstate learning tasks while enforcing equivalent quantum resource usage. Our results reveal that ML models often achieve performance comparable to or even exceeding that of DL approaches across all tasks.
Researcher Affiliation	Academia	1University of Science & Technology of China, Hefei, China 2Nanyang Technological University, Singapore. Correspondence to: Chi Zhang <EMAIL>, Yuxuan Du <EMAIL>.
Pseudocode	No	The paper describes experimental procedures and methods verbally, but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks with structured, code-like formatting.
Open Source Code	No	Part of source code of dataset generation are open-sourced in this Github Repository. (However, no specific link or repository name is provided, making it not concrete access.)
Open Datasets	No	The paper states that datasets are constructed using tools like Pasta Q.jl (Torlai & Fishman, 2020) and source code provided by Penny Lane AI (2022). While these are third-party tools/resources, the paper does not explicitly state that the specific datasets generated by the authors for their experiments are publicly available or provide links to them.
Dataset Splits	Yes	We employ test error as a surrogate to evaluate the expected risk R(h) in Eq. (7) by fixing nte = 200 test examples... The number of training examples n and snapshots M varies depending on the tasks and will be detailed later. Each task is repeated five times under each setting to collect statistical results. The training size and the snapshots are varied as n {20, 40, 60, 80, 100} and M {64, 128, 256, 512}... we use test accuracy to evaluate the classification performance, where nte = 1600 test examples are used for all cases... For LLm4QPE-T, we set npre = 100 and Mpre = 512, and the cost of quantum resources in all classifiers satisfy n M = nsft Msft + npre Mpre, where nsft {20, 60, 100} and M = Msft.
Hardware Specification	No	The paper discusses the cost of quantum computing hardware for measurements (e.g., IQM-Garnet, Ion Q-Aria, Ion Q-Forte) but does not specify the classical computing hardware (e.g., CPU, GPU models) used to train and run the ML/DL models in their experiments.
Software Dependencies	No	We implement the linear regressor h LR(x(i); w)... implemented by library neural tangents (Novak et al., 2020; 2022)... We utilize different simulation tools to generate the datasets... constructed by using Pasta Q.jl (Torlai & Fishman, 2020)... source code provided in Ref. Penny Lane AI (2022)... using Optuna (Akiba et al., 2019). (No specific version numbers are provided for these libraries or tools.)
Experiment Setup	Yes	The hyperparameter λ of two ML models is fixed to be 103... LLM4QPE-F with about 18.1M parameters is trained under its default setting... For LLM4QPE-T, we set npre = 100 and Mpre = 1024... The employed MLPs are composed of only one hidden layer whose dimension varies {16, 32, 64, 128}... All models are trained in 1000 epochs with an early stopping strategy. We consider ℓ2-regularization weights as hyperparameters.