When do neural networks learn world models?
Authors: Tianren Zhang, Guanyu Chen, Feng Chen
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 5, we illustrate the algorithmic implications of our results on two representative tasks: polynomial extrapolation (Xu et al., 2021) and learning physical laws (Kang et al., 2025). We show that architectures inspired by our analysis outperform conventional architectures such as Re LU MLPs and transformers (Vaswani et al., 2017) in these tasks. Section D presents numerical experiments that substantiate our theoretical results. |
| Researcher Affiliation | Academia | 1Department of Automation, Tsinghua University, Beijing, China. Correspondence to: Feng Chen <EMAIL>. |
| Pseudocode | No | The paper describes algorithmic implications and experimental procedures in prose, but it does not contain a clearly labeled pseudocode block or algorithm section with structured steps. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing code, nor does it provide a link to a code repository or indicate code availability in supplementary materials for the methodology described. |
| Open Datasets | No | The paper describes generating its own datasets for the polynomial extrapolation and learning physical laws tasks, specifying how the data is sampled and its parameters (e.g., "We consider fitting and extrapolating degree-n polynomials...", "we create training and test sequences representing ball-shaped object movements..."). It does not mention using or providing access to well-known public datasets with specific access information like URLs, DOIs, or formal citations with author/year for public access. |
| Dataset Splits | Yes | For Polynomial Extrapolation: Training, validation, and test data are uniformly sampled from [ 1, 1), [ 1, 1), and [ 2, 2), respectively. For each polynomial instance, we sample 50, 000 training data, 1, 000 validation data, and 10, 000 test data. For Learning Physical Laws: For both settings, we sample 1M training sequence and 50, 000 test sequence. |
| Hardware Specification | Yes | All of our experiments were conducted using Py Torch (Paszke et al., 2019) on NVIDIA V100/A100 GPUs. |
| Software Dependencies | No | The paper mentions using "PyTorch (Paszke et al., 2019)", but it does not provide a specific version number for PyTorch or any other software libraries or tools. |
| Experiment Setup | Yes | For Polynomial Extrapolation: Number of layers d is set to 4. Width of each W (i) from {128, 256, 512}. Initial learning rate from {1e 3, 1e 4, 1e 5}. We use a cosine learning rate scheduler. Weight decay is set to 0.1. Batch size is set to 512. Number of epochs is set to 400. For Learning Physical Laws: Number of layers of transformer is set to 4. Number of heads of transformer is set to 4. Width of transformer is set to 512. Initial learning rate is randomly sampled from [1e 6, 1e 3]. We use a cosine learning rate scheduler. Weight decay is set to 1e 4. Batch size is set to 1024. Number of epochs is set to 300. |