Cross-regularization: Adaptive Model Complexity through Validation Gradients
Authors: Carlos Stein Brito
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The empirical results in Figure 1(A-C) show crossregularization converging to optimally tuned ridge regression through direct gradient descent. ... Figure 2. Noise dynamics in VGG-16 on CIFAR-10 reveal architectural regularization patterns. ... Figures 10, 11 and 12 show the detailed plots for the systematic method analyses. |
| Researcher Affiliation | Industry | 1Night City Labs, Lisbon, Portugal. Correspondence to: Carlos Stein Brito <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Cross-regularization Training |
| Open Source Code | No | The paper does not provide concrete access to source code. It does not contain a specific repository link, an explicit code release statement, or code in supplementary materials. |
| Open Datasets | Yes | Figure 2. Noise dynamics in VGG-16 on CIFAR-10 reveal architectural regularization patterns. ... We evaluate L1 cross-regularization on the diabetes regression dataset (Efron et al., 2004). ... Figure 4. Dataset growth adaptation and adaptive augmentation. ... A: Performance evolution shows successful knowledge transfer at epoch 100 transition from partial to full dataset. B: Total regularization strength automatically adapts stronger regularization compensates for limited initial data, then decreases as full dataset provides natural regularization. Vertical line marks dataset transition. C: Evolution of learned augmentation parameters on SVHN. |
| Dataset Splits | Yes | We evaluate L1 cross-regularization on the diabetes regression dataset (Efron et al., 2004). The dataset consists of 442 patients with 10 physiological features. Data is standardized and split 80/20 into train/validation sets. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions 'Adam optimizer' but does not specify version numbers for any programming languages, libraries, or frameworks used in the implementation. |
| Experiment Setup | Yes | C.3. Training Protocol. Optimization settings: Adam optimizer, Learning rates: 10-4 (model), 10-1 (noise), Initialization: log σ = 3, Batch size: 512, Training epochs: 100. |