Variance Reduction of Stochastic Hypergradient Estimation by Mixed Fixed-Point Iteration
Authors: Naoyuki Terashita, Satoshi Hara
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations on synthetic and real-world tasks verify our theoretical results and superior variance reduction over existing methods. The paper includes a dedicated 'Experiments' section (Section 5) with subsections on 'Effect of Mixing Rate' and 'Comparison with Existing Approaches', performing evaluations on various machine learning tasks like hyperparameter optimization, influence estimation, and meta learning. |
| Researcher Affiliation | Collaboration | Naoyuki Terashita is affiliated with Hitachi, Ltd., which is an industry affiliation. Satoshi Hara is affiliated with University of Electro-Communications, which is an academic affiliation. This mix indicates a collaboration. |
| Pseudocode | Yes | The paper includes a section titled 'F Python Implementation of Mixed FP-KM' which provides a Python code block (Figure 7) that explicitly implements the Mixed FP-KM algorithm. |
| Open Source Code | Yes | The code is available at https://github.com/hitachi-rd-cv/mixed-fp. |
| Open Datasets | Yes | The paper explicitly mentions and cites several well-known public datasets: 'Adult Income dataset (Becker & Kohavi, 1996)', 'Fashion-MNIST (Xiao et al., 2017)', and 'California Housing dataset (Pace & Barry, 1997)'. |
| Dataset Splits | Yes | Table 2 (Experiment settings for the real-world tasks) explicitly lists 'ntrain' and 'nval' values for each dataset used: Adult Income (5000 train, 5000 val), Fashion MNIST (5000 train, 5000 val), California Housing (5000 train, 5000 val). Additionally, Section E.1 states: 'In addition to the training and validation splits used in Section 5.2, we introduce a separate test set of 5,000 samples to evaluate the final model performance after the outer optimization.' |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. It mentions 'wall-clock basis' in the context of computational cost but does not specify the hardware. |
| Software Dependencies | No | The paper mentions the use of 'Adam optimizer' and 'Py Torch' implicitly in the Python implementation, but it does not specify any version numbers for these or any other software components used in the experiments. |
| Experiment Setup | Yes | Section D.2.2 'Influence Estimation' states: 'Any inner-problem optimization was performed using the Adam optimizer with a learning rate of 0.01. To rule out the effect incurred by inexact x(λ), for any task, we used the full-batch inner loss to compute gradients for Adam and ran 1,000 epochs to ensure the convergence.' It also details grid search ranges for hyperparameters. Section E.1 'Settings' further specifies: 'We configure the bilevel optimization with 100 outer optimization steps using SGD with a learning rate of 20.0, and 100 inner optimization steps per outer iteration using Adam with a learning rate of 0.01.' |