reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Minimum Width for Universal Approximation using Squashable Activation Functions

Authors: Jonghyun Shin, Namjun Kim, Geonho Hwang, Sejun Park

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this work, we study the minimum width of networks using general activation functions. Specifically, we focus on squashable functions that can approximate the identity function and binary step function by alternatively composing with affine transformations. We show that for networks using a squashable activation function to universally approximate Lp functions from [0, 1]dx to Rdy, the minimum width is max{dx, dy, 2} unless dx = dy = 1; the same bound holds for dx = dy = 1 if the activation function is monotone. We then provide sufficient conditions for squashability and show that all non-affine analytic functions and a class of piecewise functions are squashable, i.e., our minimum width result holds for those general classes of activation functions. Our result can be used to characterize the minimum width for a general class of practical activation functions, by showing their squashability. For example, we show that any non-affine analytic function (e.g., non-affine polynomial, SIGMOID, tanh, sin, exp, etc.) is squashable (Lemma 4). Furthermore, we also show that a wide class of piecewise continuously differentiable functions including leaky RELU and HARDSWISH are also squashable (Lemma 5). Hence, our result significantly extends the prior exact minimum width results for RELU and its variants. We prove our main result in Section 4 and conclude the paper in Section 5. Proofs of technical lemmas are deferred to Appendix.
Researcher Affiliation	Academia	1Department of Mathematics Education, Korea University 2Department of Artificial Intelligence, Korea University 3Department of Mathematical Sciences, GIST. Correspondence to: Sejun Park <EMAIL>.
Pseudocode	No	The paper describes theoretical constructions and proofs, such as "We use the coding scheme (Park et al., 2021b) to prove our result. In particular, we construct our decoder fdec as a curve that densely fills the codomain of a target function so that supx f ([0,1]dx) infy fdec([0,1]) x y is small. We then construct our encoder to map each x [0, 1]dx to a neighborhood of f 1 dec (z) for some z f (x).", but does not include any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any statements about releasing code, links to code repositories, or mentions of code in supplementary materials.
Open Datasets	No	We show that for networks using a squashable activation function to universally approximate Lp functions from [0, 1]dx to Rdy, the minimum width is max{dx, dy, 2} unless dx = dy = 1; the same bound holds for dx = dy = 1 if the activation function is monotone. wσ networks of width w are dense in Lp([0, 1]dx, Rdy) but σ networks of width w 1 are not dense. The paper uses theoretical function spaces (Lp functions on [0,1]dx to Rdy) rather than empirical datasets.
Dataset Splits	No	The paper analyzes theoretical properties of neural networks using abstract function spaces (Lp functions on [0,1]dx to Rdy) and does not involve empirical datasets or their splits.
Hardware Specification	No	The paper focuses on theoretical mathematical proofs regarding neural network properties and does not describe any experimental setup or hardware used for computations.
Software Dependencies	No	The paper is theoretical and focuses on mathematical proofs and definitions; therefore, it does not mention any software dependencies or version numbers.
Experiment Setup	No	The paper provides theoretical analysis and mathematical proofs for the minimum width for universal approximation. It does not describe any experimental setup, hyperparameter values, or training configurations.