reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient Reasoning Models: A Survey

Authors: Sicheng Feng, Gongfan Fang, Xinyin Ma, Xinchao Wang

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This survey aims to provide a comprehensive overview of recent advances in efficient reasoning. It categorizes existing works into three key directions... A curated collection of papers discussed in this survey is available in our Git Hub repository: https://github.com/fscdc/Awesome-Efficient-Reasoning-Models. The paper also presents performance tables (e.g., Table 1, Table 5) summarizing empirical findings from various methods.
Researcher Affiliation	Academia	Sicheng Feng EMAIL National University of Singapore, Singapore Nankai University, Tianjin, China Gongfan Fang EMAIL National University of Singapore, Singapore Xinyin Ma EMAIL National University of Singapore, Singapore Xinchao Wang EMAIL National University of Singapore, Singapore. The institutions listed are universities, and email domains (.edu.cn, .nus.edu.sg) indicate academic affiliations.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks for its own methodology.
Open Source Code	No	The paper mentions a GitHub repository (https://github.com/fscdc/Awesome-Efficient-Reasoning-Models) which contains a curated collection of papers discussed in the survey, not the source code for the survey's methodology itself. It also refers to code for specific projects discussed within the survey (e.g., Deep Scale R's code and models), but this is not code for the survey's own work.
Open Datasets	Yes	Section 4.2 and Table 6 provide a comprehensive list of datasets and benchmarks used for evaluating reasoning capabilities. For example, GSM8K is listed with 'Hugging Face Dataset' as its source, MATH & MATH-500 also 'Hugging Face Dataset', and Pronto QA from 'Git Hub'. These clearly indicate publicly available datasets.
Dataset Splits	No	As a survey paper, this work does not conduct its own experiments or specify training/test/validation dataset splits. It reviews how other papers utilize datasets and benchmarks, but does not provide split information for its own methodology.
Hardware Specification	No	This paper is a survey and does not describe any specific hardware used to conduct experiments related to its own methodology. Mentions of hardware (e.g., '4 A40 GPUs') refer to the setups used in the surveyed works, not by the authors of this survey.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers for its own methodology, as it is a survey of existing research rather than an experimental paper.
Experiment Setup	No	This paper is a survey and does not include details about its own experimental setup, such as hyperparameters or system-level training settings. Such details are typically found in empirical research papers reporting novel experiments.