Position: A Theory of Deep Learning Must Include Compositional Sparsity

Authors: David A. Danhofer, Davide D’Ascenzo, Rafael Dubach, Tomaso A Poggio

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this position paper we argue that it is the ability of DNNs to exploit the compositionally sparse structure of the target function driving their success. As such, DNNs can leverage the property that most practically relevant functions can be composed from a small set of constituent functions, each of which relies only on a low-dimensional subset of all inputs. We show that this property is shared by all efficiently Turing-computable functions and is therefore highly likely present in all current learning problems. While some promising theoretical insights on questions concerned with approximation and generalization exist in the setting of compositionally sparse functions, several important questions on the learnability and optimization of DNNs remain.
Researcher Affiliation Academia 1Center for Brains, Minds and Machines (CBMM), MIT, Cambridge, MA, USA 2ETH Zurich, Zurich, Switzerland 3Politecnico di Torino, Torino, Italy 4University of Milan, Milan, Italy 5University of Zurich, Zurich, Switzerland. Correspondence to: Davide D Ascenzo <EMAIL>.
Pseudocode No The paper includes mathematical definitions, theorems, and proofs (e.g., in Appendix A), but it does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured, code-like steps for a procedure.
Open Source Code No The paper does not contain any statements about releasing code, links to code repositories, or mentions of code being available in supplementary materials for the methodology described.
Open Datasets No The paper is theoretical and does not present experiments that would require specific datasets with access information provided by the authors. While it refers to datasets used in other research (e.g., ImageNet, AlphaGo, LLM training corpora), it does not provide access information for a dataset it uses for its own analysis or experiments.
Dataset Splits No The paper is theoretical and does not present experiments or analyze specific datasets, therefore, it does not provide any information regarding dataset splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not present any experimental results. Therefore, it does not specify any hardware used for running experiments.
Software Dependencies No The paper is theoretical and does not present any experimental results. Therefore, it does not specify any software dependencies with version numbers needed to replicate an experiment.
Experiment Setup No The paper is theoretical and does not describe any specific experimental setup, hyperparameters, or system-level training settings as it does not present its own experiments.