reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On the Generalization Ability of Next-Token-Prediction Pretraining

Authors: Zhihao Li, Xue Jiang, Liyuan Liu, Xuelin Zhang, Hong Chen, Feng Zheng

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Additionally, experiments on public datasets verify our theoretical findings. Our code is available at https://github.com/Lizeihao/MININTP. To validate the theoretical contribution of this paper, specifically, Theorem 4.22, we performed a set of NTP pre-training experiments in DOMs. These experiments were designed to systematically examine the influence of model parameters and sample size on generalization performance.
Researcher Affiliation	Academia	1College of Informatics, Huazhong Agricultural University 2Department of Computer Science and Engineering, Southern University of Science and Technology 3Department of Computer Science, Hong Kong Baptist University 4Engineering Research Center of Intelligent Technology for Agriculture, Ministry of Education, China. Correspondence to: Hong Chen <EMAIL>.
Pseudocode	No	The paper defines the architecture of DOMs (Figure 1b, 1c) and presents mathematical formulations for its components (Equations 4, 5, 6, 7), but it does not include any specific section or figure labeled 'Pseudocode' or 'Algorithm', nor does it present structured, step-by-step procedures in a code-like format.
Open Source Code	Yes	Our code is available at https://github.com/Lizeihao/MININTP.
Open Datasets	Yes	For pretraining, we employ the Mini Mind dataset1, while our test set consists of 8,192 samples (with a maximum sequence length of m 512) carefully selected from the DAMO NLP dataset2. 1https://www.modelscope.cn/datasets/ gongjy/minimind_dataset 2https://www.modelscope.cn/datasets/DAMO_ NLP/lcsts_test_set
Dataset Splits	Yes	For pretraining, we employ the Mini Mind dataset1, while our test set consists of 8,192 samples (with a maximum sequence length of m 512) carefully selected from the DAMO NLP dataset2. (...) we evaluated performance across 50%, 75%, and 100% subsets of the complete pretraining dataset.
Hardware Specification	Yes	To optimize efficiency, we employ Flash Attention (Dao et al., 2022) for accelerated attention computation and conduct distributed training on 8 NVIDIA A800-80GB GPUs using Deep Speed-Zero2 (Rajbhandari et al., 2020).
Software Dependencies	No	To optimize efficiency, we employ Flash Attention (Dao et al., 2022) for accelerated attention computation and conduct distributed training on 8 NVIDIA A800-80GB GPUs using Deep Speed-Zero2 (Rajbhandari et al., 2020). For optimization, we utilized the Adam W (Loshchilov & Hutter, 2017) optimizer, combined with a cosine learning rate scheduler that includes a 20-step warm-up phase during the initial training stage. The paper mentions software components like Flash Attention, Deep Speed-Zero2, and AdamW optimizer, but it does not provide specific version numbers for any of these components.
Experiment Setup	Yes	Our training methodology follows the approach outlined in Mini Mind. To optimize efficiency, we employ Flash Attention (Dao et al., 2022) for accelerated attention computation and conduct distributed training on 8 NVIDIA A800-80GB GPUs using Deep Speed-Zero2 (Rajbhandari et al., 2020). For optimization, we utilized the Adam W (Loshchilov & Hutter, 2017) optimizer, combined with a cosine learning rate scheduler that includes a 20-step warm-up phase during the initial training stage. (...) Table 3. Model architectures, training data specifications, hyperparameter configurations, and test PPL (m = 512).