How DNNs break the Curse of Dimensionality: Compositionality and Symmetry Learning
Authors: Arthur Jacot, Seok Hoan Choi, Yuxiao Wen
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we observe a good match between the scaling laws of learning and our theory, as well as qualitative features such as transitions between regimes depending on whether it is harder to learn the symmetries of a task, or to learn the task given its symmetries. ... 4.2.1 EXPERIMENTS We train our models on a synthetic dataset obtained by composing two Gaussian processes Y = h(g(X)) with Mat ern kernels Kg, Kh chosen so that g and h have the right differentiability. In Figure 2, we compare the empirical rates (by doing a linear fit on a log-log plot of test error as a function of N) and rates min{ 1 2, 2νg 2νg+din , 2νh 2νh+dmid } which seem to yield the best match. ... A EXPERIMENTAL SETUP In this section, we review our numerical experiments and their setup both on synthetic and realworld datasets in order to address theoretical results more clearly and intuitively. |
| Researcher Affiliation | Academia | Arthur Jacot, Seok Hoan Choi & Yuxiao Wen Courant Institute New York University New York, NY 10012 EMAIL |
| Pseudocode | No | The paper describes algorithms and methods in prose but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures with structured, code-like formatting. |
| Open Source Code | Yes | The code used for experiments is publicly available here. |
| Open Datasets | Yes | In our study, we utilized both MNIST (Modified National Institute of Standards and Technology) and WESAD (Wearable Stress and Affect Detection) to train our Acc Nets for classification tasks. ... Schmidt et al., 2018. |
| Dataset Splits | Yes | Among the 52,500 dataset points, 50,000 were allocated for training, and 2,500 were used for the test dataset. ... We implemented a train-test split ratio of approximately 75:25, resulting in 100,000 dataset for the training set and the rest 36,482 dataset for the test set. |
| Hardware Specification | Yes | We conducted experiments utilizing 12 NVIDIA V100 GPUs (each with 32GB of memory) over approximately 7 days to train all the synthetic datasets. |
| Software Dependencies | No | We utilized Re LU as the activation function and L2-norm as the cost function, with the Adam optimizer. The paper does not provide specific version numbers for these or any other software libraries or frameworks used. |
| Experiment Setup | Yes | For FCNN and Acc Nets, we set the network depth to 12 layers, with the layer widths as [din, 500, 500, ..., 500, dout] for DNNs, and [din, 900, 100, 900, ..., 100, 900, dout] for Acc Nets. ... The total number of batch was set to 5, and the training process was conducted over 3600 epochs, divided into three phases. The detailed optimizer parameters are as follows: 1. For the first 1200 epochs: learning rate (lr) = 1.5 0.001, weight decay = 0 2. For the second 1200 epochs: lr = 0.4 0.001, weight decay = 0.002 3. For the final 1200 epochs: lr = 0.1 0.001, weight decay = 0.005 |