Variance Reduction of Stochastic Hypergradient Estimation by Mixed Fixed-Point Iteration

Authors: Naoyuki Terashita, Satoshi Hara

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluations on synthetic and real-world tasks verify our theoretical results and superior variance reduction over existing methods. The paper includes a dedicated 'Experiments' section (Section 5) with subsections on 'Effect of Mixing Rate' and 'Comparison with Existing Approaches', performing evaluations on various machine learning tasks like hyperparameter optimization, influence estimation, and meta learning.
Researcher Affiliation Collaboration Naoyuki Terashita is affiliated with Hitachi, Ltd., which is an industry affiliation. Satoshi Hara is affiliated with University of Electro-Communications, which is an academic affiliation. This mix indicates a collaboration.
Pseudocode Yes The paper includes a section titled 'F Python Implementation of Mixed FP-KM' which provides a Python code block (Figure 7) that explicitly implements the Mixed FP-KM algorithm.
Open Source Code Yes The code is available at https://github.com/hitachi-rd-cv/mixed-fp.
Open Datasets Yes The paper explicitly mentions and cites several well-known public datasets: 'Adult Income dataset (Becker & Kohavi, 1996)', 'Fashion-MNIST (Xiao et al., 2017)', and 'California Housing dataset (Pace & Barry, 1997)'.
Dataset Splits Yes Table 2 (Experiment settings for the real-world tasks) explicitly lists 'ntrain' and 'nval' values for each dataset used: Adult Income (5000 train, 5000 val), Fashion MNIST (5000 train, 5000 val), California Housing (5000 train, 5000 val). Additionally, Section E.1 states: 'In addition to the training and validation splits used in Section 5.2, we introduce a separate test set of 5,000 samples to evaluate the final model performance after the outer optimization.'
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. It mentions 'wall-clock basis' in the context of computational cost but does not specify the hardware.
Software Dependencies No The paper mentions the use of 'Adam optimizer' and 'Py Torch' implicitly in the Python implementation, but it does not specify any version numbers for these or any other software components used in the experiments.
Experiment Setup Yes Section D.2.2 'Influence Estimation' states: 'Any inner-problem optimization was performed using the Adam optimizer with a learning rate of 0.01. To rule out the effect incurred by inexact x(λ), for any task, we used the full-batch inner loss to compute gradients for Adam and ran 1,000 epochs to ensure the convergence.' It also details grid search ranges for hyperparameters. Section E.1 'Settings' further specifies: 'We configure the bilevel optimization with 100 outer optimization steps using SGD with a learning rate of 20.0, and 100 inner optimization steps per outer iteration using Adam with a learning rate of 0.01.'