Range, not Independence, Drives Modularity in Biologically Inspired Representations
Authors: Will Dorrell, Kyle Hsu, Luke Hollingsworth, Jin Hwa Lee, Jiajun Wu, Chelsea Finn, Peter Latham, Timothy Behrens, James Whittington
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | From this theory, we extract and validate predictions in a variety of empirical studies on how data distribution affects modularisation in nonlinear feedforward and recurrent neural networks trained on supervised and unsupervised tasks. Furthermore, we apply these ideas to neuroscience data, showing that range independence can be used to understand the mixing or modularising of spatial and reward information in entorhinal recordings in seemingly conflicting experiments. |
| Researcher Affiliation | Academia | 1University College London 2Stanford University 3 Oxford University |
| Pseudocode | No | The paper describes mathematical derivations and conceptual diagrams, but does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Reproducibility Statement Our theoretical work is fully detailed in the appendices, and code to reproduce our empirical work can be found at https://github.com/kylehkhsu/modular. |
| Open Datasets | Yes | We study the performance of QLAE trained to autoencode a subset of the Isaac3D dataset (Nie, 2019), a naturalistic image dataset with well defined underlying latent dimensions. |
| Dataset Splits | No | The paper describes data generation and usage (e.g., "Three-dimensional source data is sampled from [0, 1]3 and discretised to 21 values per dimension.", "We subsample 6 out of the 9 sources in the Isaac3D dataset... This yields a dataset of size 12, 288.", "We train the Student RNN on 10000 sequences generated by the Teacher RNN.") but does not explicitly provide information on how these datasets are split into training, validation, or test sets for reproducibility. |
| Hardware Specification | No | Each experiment was executed using a single consumer GPU on a HPC with 2 CPUs and 4GB of RAM. Each experiment was executed using a single consumer GPU on a HPC with 8 CPUs and 8GB of RAM. |
| Software Dependencies | No | The paper mentions "Py Torch" for network design and "Adam optimiser", but does not provide specific version numbers for these software components or any other libraries. |
| Experiment Setup | Yes | We regularise the activity and weight energies, and enforce nonnegativity using a Re LU. [...] λR is the regularisation hyperparameter, typically set to 0.01 unless specified otherwise. The network is trained using the Adam optimiser with a learning rate ranging between 0.001 and 0.01, adjusted as needed. Experiments are run for order 104 epochs on 5 random seeds. [...] All models use λreconstruct = 1, λactivity energy = 0.01, λactivity nonnegativity = 1, and λweight energy = 0.0001. Models are initialised from a He initialisation scaled by 0.3 and optimized with Adam using learning rate 0.001. [...] For the linear RNN, we used learning rate 1e-3, 30k training iterations and λtarget = 1, λactivity = 0.5, λpositivity = 5, and λweight = 0.02. For the nonlinear RNN, we used learning rate 7.5e-4, 40k training iterations and λtarget = 5, λactivity = 0.5 and λweight = 0.01. In both case, we initialised the weights to be orthogonal and the biases at zero, and used the Adam optimiser. |