Cross-regularization: Adaptive Model Complexity through Validation Gradients

Authors: Carlos Stein Brito

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The empirical results in Figure 1(A-C) show crossregularization converging to optimally tuned ridge regression through direct gradient descent. ... Figure 2. Noise dynamics in VGG-16 on CIFAR-10 reveal architectural regularization patterns. ... Figures 10, 11 and 12 show the detailed plots for the systematic method analyses.
Researcher Affiliation Industry 1Night City Labs, Lisbon, Portugal. Correspondence to: Carlos Stein Brito <EMAIL>.
Pseudocode Yes Algorithm 1 Cross-regularization Training
Open Source Code No The paper does not provide concrete access to source code. It does not contain a specific repository link, an explicit code release statement, or code in supplementary materials.
Open Datasets Yes Figure 2. Noise dynamics in VGG-16 on CIFAR-10 reveal architectural regularization patterns. ... We evaluate L1 cross-regularization on the diabetes regression dataset (Efron et al., 2004). ... Figure 4. Dataset growth adaptation and adaptive augmentation. ... A: Performance evolution shows successful knowledge transfer at epoch 100 transition from partial to full dataset. B: Total regularization strength automatically adapts stronger regularization compensates for limited initial data, then decreases as full dataset provides natural regularization. Vertical line marks dataset transition. C: Evolution of learned augmentation parameters on SVHN.
Dataset Splits Yes We evaluate L1 cross-regularization on the diabetes regression dataset (Efron et al., 2004). The dataset consists of 442 patients with 10 physiological features. Data is standardized and split 80/20 into train/validation sets.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The paper mentions 'Adam optimizer' but does not specify version numbers for any programming languages, libraries, or frameworks used in the implementation.
Experiment Setup Yes C.3. Training Protocol. Optimization settings: Adam optimizer, Learning rates: 10-4 (model), 10-1 (noise), Initialization: log σ = 3, Batch size: 512, Training epochs: 100.