Formation of Representations in Neural Networks
Authors: Liu Ziyin, Isaac Chuang, Tomer Galanti, Tomaso Poggio
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present experimental evidence that supports predictions resulting from the CRH. We also perform experiments to test mechanisms that break the CRH. |
| Researcher Affiliation | Collaboration | 1Massachusetts Institute of Technology 2Texas A&M University 3NTT Research |
| Pseudocode | No | The paper describes theoretical frameworks and experimental results but does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement regarding the availability of source code or links to a code repository for the described methodology. |
| Open Datasets | Yes | res1: Res Net-18 (11M parameters) for the image classification; res2: Res Net-18 self-supervised learning tasks with the CIFAR-10/100 datasets. llm: a six-layer eight-head transformer (100M parameters) trained on the Open Web Text (OWT) dataset (Gokaslan & Cohen, 2019); |
| Dataset Splits | Yes | res1: Res Net-18 (11M parameters) for the image classification; ... We measure the covariances matrices with data points from the test set. res2: Res Net-18 for self-supervised learning tasks with the CIFAR-10/100 datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions optimizers like SGD and Adam, and activation functions like ReLU, but does not provide specific software dependencies with version numbers (e.g., PyTorch 1.9, Python 3.8). |
| Experiment Setup | Yes | fc1: ...depth of the network (D = 4), the width of the network (d = 100), weight decay strength (γ = 2 10 5), minibatch size (B = 100). fc2: ...SGD with a learning rate of 0.1 with momentum 0.9 and γ = 10 4 for 105 steps... batch size of 100. res1: ...train with SGD with a learning rate 0.01, momentum 0.9, cosine annealing for 200 epochs, and batch size 128. |