Range, not Independence, Drives Modularity in Biologically Inspired Representations

Authors: Will Dorrell, Kyle Hsu, Luke Hollingsworth, Jin Hwa Lee, Jiajun Wu, Chelsea Finn, Peter Latham, Timothy Behrens, James Whittington

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental From this theory, we extract and validate predictions in a variety of empirical studies on how data distribution affects modularisation in nonlinear feedforward and recurrent neural networks trained on supervised and unsupervised tasks. Furthermore, we apply these ideas to neuroscience data, showing that range independence can be used to understand the mixing or modularising of spatial and reward information in entorhinal recordings in seemingly conflicting experiments.
Researcher Affiliation Academia 1University College London 2Stanford University 3 Oxford University
Pseudocode No The paper describes mathematical derivations and conceptual diagrams, but does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Reproducibility Statement Our theoretical work is fully detailed in the appendices, and code to reproduce our empirical work can be found at https://github.com/kylehkhsu/modular.
Open Datasets Yes We study the performance of QLAE trained to autoencode a subset of the Isaac3D dataset (Nie, 2019), a naturalistic image dataset with well defined underlying latent dimensions.
Dataset Splits No The paper describes data generation and usage (e.g., "Three-dimensional source data is sampled from [0, 1]3 and discretised to 21 values per dimension.", "We subsample 6 out of the 9 sources in the Isaac3D dataset... This yields a dataset of size 12, 288.", "We train the Student RNN on 10000 sequences generated by the Teacher RNN.") but does not explicitly provide information on how these datasets are split into training, validation, or test sets for reproducibility.
Hardware Specification No Each experiment was executed using a single consumer GPU on a HPC with 2 CPUs and 4GB of RAM. Each experiment was executed using a single consumer GPU on a HPC with 8 CPUs and 8GB of RAM.
Software Dependencies No The paper mentions "Py Torch" for network design and "Adam optimiser", but does not provide specific version numbers for these software components or any other libraries.
Experiment Setup Yes We regularise the activity and weight energies, and enforce nonnegativity using a Re LU. [...] λR is the regularisation hyperparameter, typically set to 0.01 unless specified otherwise. The network is trained using the Adam optimiser with a learning rate ranging between 0.001 and 0.01, adjusted as needed. Experiments are run for order 104 epochs on 5 random seeds. [...] All models use λreconstruct = 1, λactivity energy = 0.01, λactivity nonnegativity = 1, and λweight energy = 0.0001. Models are initialised from a He initialisation scaled by 0.3 and optimized with Adam using learning rate 0.001. [...] For the linear RNN, we used learning rate 1e-3, 30k training iterations and λtarget = 1, λactivity = 0.5, λpositivity = 5, and λweight = 0.02. For the nonlinear RNN, we used learning rate 7.5e-4, 40k training iterations and λtarget = 5, λactivity = 0.5 and λweight = 0.01. In both case, we initialised the weights to be orthogonal and the biases at zero, and used the Adam optimiser.