reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

When do neural networks learn world models?

Authors: Tianren Zhang, Guanyu Chen, Feng Chen

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 5, we illustrate the algorithmic implications of our results on two representative tasks: polynomial extrapolation (Xu et al., 2021) and learning physical laws (Kang et al., 2025). We show that architectures inspired by our analysis outperform conventional architectures such as Re LU MLPs and transformers (Vaswani et al., 2017) in these tasks. Section D presents numerical experiments that substantiate our theoretical results.
Researcher Affiliation	Academia	1Department of Automation, Tsinghua University, Beijing, China. Correspondence to: Feng Chen <EMAIL>.
Pseudocode	No	The paper describes algorithmic implications and experimental procedures in prose, but it does not contain a clearly labeled pseudocode block or algorithm section with structured steps.
Open Source Code	No	The paper does not contain an explicit statement about releasing code, nor does it provide a link to a code repository or indicate code availability in supplementary materials for the methodology described.
Open Datasets	No	The paper describes generating its own datasets for the polynomial extrapolation and learning physical laws tasks, specifying how the data is sampled and its parameters (e.g., "We consider fitting and extrapolating degree-n polynomials...", "we create training and test sequences representing ball-shaped object movements..."). It does not mention using or providing access to well-known public datasets with specific access information like URLs, DOIs, or formal citations with author/year for public access.
Dataset Splits	Yes	For Polynomial Extrapolation: Training, validation, and test data are uniformly sampled from [ 1, 1), [ 1, 1), and [ 2, 2), respectively. For each polynomial instance, we sample 50, 000 training data, 1, 000 validation data, and 10, 000 test data. For Learning Physical Laws: For both settings, we sample 1M training sequence and 50, 000 test sequence.
Hardware Specification	Yes	All of our experiments were conducted using Py Torch (Paszke et al., 2019) on NVIDIA V100/A100 GPUs.
Software Dependencies	No	The paper mentions using "PyTorch (Paszke et al., 2019)", but it does not provide a specific version number for PyTorch or any other software libraries or tools.
Experiment Setup	Yes	For Polynomial Extrapolation: Number of layers d is set to 4. Width of each W (i) from {128, 256, 512}. Initial learning rate from {1e 3, 1e 4, 1e 5}. We use a cosine learning rate scheduler. Weight decay is set to 0.1. Batch size is set to 512. Number of epochs is set to 400. For Learning Physical Laws: Number of layers of transformer is set to 4. Number of heads of transformer is set to 4. Width of transformer is set to 512. Initial learning rate is randomly sampled from [1e 6, 1e 3]. We use a cosine learning rate scheduler. Weight decay is set to 1e 4. Batch size is set to 1024. Number of epochs is set to 300.