Generative Feature Training of Thin 2-Layer Networks
Authors: Johannes Hertrich, Sebastian Neumayer
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our approach by numerical examples. First, we visually inspect the obtained features. Here, we also check if they recover the correct subspaces. Secondly, we benchmark our methods on common test functions from approximation theory, i.e., with a known groundtruth. Lastly, we target regression on some datasets from the UCI database (Kelly et al., 2023). |
| Researcher Affiliation | Academia | Johannes Hertrich EMAIL Université Paris Dauphine PSL Sebastian Neumayer EMAIL Technische Universität Chemnitz |
| Pseudocode | Yes | Algorithm 1 GFT and GFT-r training procedures. 1: Given: data (xk, yk)M k=1, architecture fw,b as in (1), generator Gθ, latent distribution η 2: while training Gθ do 3: sample N latent zl η and set w = Gθ(z) 4: compute optimal b(w) and wb(w) based on (6) 5: compute θL(θ) or θLreg(θ) with automatic differentiation 6: perform Adam update for θ 7: if GFT-r then 8: while refining w do 9: set w = Gθ(z) 10: compute optimal b(w) and wb(w) based on (6) 11: compute z F(z) or z Freg(z) with automatic differentiation 12: perform Adam update for z 13: Output: features w and optimal weights b(w) |
| Open Source Code | Yes | Our Py Torch implementation is available online1. We run all experiments on a NVIDIA RTX 4090 GPU. [...] The Py Torch implementation corresponding to our experiments is available at https://github.com/johertrich/generative_feature_ training. |
| Open Datasets | Yes | Next, we apply our method for regression on several UCI datasets Kelly et al. (2023). For this, we do not have an underlying ground truth function f. Here, we compare our methods with standard gradient-based neural network training, SHRIMP and SALSA. |
| Dataset Splits | Yes | To pick the regularization strength λ, we divide the original training data into a training (90%) and a validation (10%) set. |
| Hardware Specification | Yes | Our Py Torch implementation is available online1. We run all experiments on a NVIDIA RTX 4090 GPU. |
| Software Dependencies | No | The paper mentions 'Py Torch implementation' and 'Adam optimizer' but does not specify version numbers for these software components. For example, it states: 'This procedure is implemented in many automatic differentiation packages such as Py Torch', and 'We optimize the loss functions for GFT and for the feature refinement with the Adam optimizer'. |
| Experiment Setup | Yes | We optimize the loss functions for GFT and for the feature refinement with the Adam optimizer using a learning rate of 1 10 4 for 40000 steps. The regularization ϵ for solving the least squares problem (6) is set to ϵ = 1 10 7. For the neural network optimization, we use the Adam optimizer with a learning rate of 1 10 3 for 100000 steps. In all cases, we discretize the spatial integral for the regularization term in (10) by 1000 samples. For the kernel ridge regression, we use a Gauss kernel with its parameter chosen by the median rule. That is, we set it to the median distance of two points in the dataset. [...] Further, we choose the generator Gθ for the proposal distribution pw = Gθ#N(0, Id) as Re LU network with 3 hidden layers and 512 neurons per hidden layer. |