reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Strong Model Collapse

Authors: Elvis Dohmatob, Yunzhen Feng, Arjun Subramonian, Julia Kempe

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our theoretical findings are empirically verified through experiments on language models and neural networks for images.
Researcher Affiliation	Collaboration	1Meta FAIR 2Concordia University 3Mila 4NYU 5UCLA Work done while interning at Meta. Correspondence to EMAIL
Pseudocode	No	The paper describes methods and analyses through mathematical formulations and textual descriptions. No explicit pseudocode or algorithm blocks are present.
Open Source Code	No	The paper mentions the generation process for the Babi Stories dataset is detailed in the Git Hub repository of Zhang et al. (2024a), but does not state that the authors' own implementation code for the methodology described in this paper is available.
Open Datasets	Yes	Toy settings, including random projections model on Gaussian data, and shallow networks fully trained on the MNIST dataset (Deng, 2012). Realistic setting of GPT-2 models trained on Babi Stories (Zhang et al., 2024a), a reproduction of Tiny Stories (Eldan & Li, 2023) using the Mixtral-8x7B open language model (Jiang et al., 2024)).
Dataset Splits	Yes	The dataset comprises a training set of 2,200,000 stories and a validation set of 22,000 stories, created by prompting the Mistral-8x7B model. [...] A validation set is used to select the best checkpoint, and evaluation is conducted on the test set using the clean labels.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models or processor types used for running its experiments.
Software Dependencies	No	The paper mentions using a 'GPT-2-small model' but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The regularization parameter λ is set to a very small value (10 8). The two-layer neural networks are trained using stochastic gradient descent (SGD) with a batch size of 128 and a learning rate of 0.1. The models are trained for 400 epochs to fully converge. During training, we applied a learning rate of 5 10 3, a dropout rate of 0.05, L2 weight decay of 0.1, and a warm-up phase of 2,000 iterations.