Layered-Parameter Perturbation for Zeroth-Order Optimization of Optical Neural Networks
Authors: Hiroshi Sawada, Kazuo Aoyama, Masaya Notomi
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that the proposed method using the special covariance matrix significantly outperformed conventional methods. Section 5 shows the experimental results. |
| Researcher Affiliation | Industry | 1Communication Science Laboratories, NTT Corporation, Japan 2Basic Research Laboratories, NTT Corporation, Japan EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: ZO optimization with layered-parameter perturbations |
| Open Source Code | No | The paper does not contain any explicit statement about making the source code available or provide a link to a code repository. |
| Open Datasets | Yes | We used the MNIST (Le Cun and Cortes 2010) and Fashion MNIST (Xiao, Rasul, and Vollgraf 2017) as image datasets, which were of appropriate difficulties for ONN circuits without memory. |
| Dataset Splits | No | The paper mentions using MNIST and Fashion MNIST datasets and reports test accuracies, implying the use of standard splits. However, it does not explicitly state the specific train/test/validation percentages or sample counts within the text. |
| Hardware Specification | Yes | We built our customized CUDA kernels (Luebke 2008) for computational acceleration, and ran the program on an NVIDIA RTX A6000 (48 GB) as the GPU. |
| Software Dependencies | No | We employed Py Torch (Paszke et al. 2019) to simulate and train ONNs. We built our customized CUDA kernels (Luebke 2008) for computational acceleration, and ran the program on an NVIDIA RTX A6000 (48 GB) as the GPU. While PyTorch and CUDA are mentioned, specific version numbers are not provided for either. |
| Experiment Setup | Yes | Table 2: Hyperparameter settings Mini-batch size B = 100 Number of perturbation vectors Q = K Scale (ZO optimization) λ = 1/N Smoothing (ZO optimization) µ = 0.001/sqrt(N) Exponential smoothing of Fu α = 0.01 Regularizing weight for Σu ρ = 0.1 Update interval of Fu and Σu Tud = 100 Number of random input vectors Rin = 100 Num. of output perturbation vectors Rout = 100 The learning rate η of Adam and the step-size sigma0 of CMA-ES were optimized by using Optuna (Akiba et al. 2019) for each combination of task, dimensionality (K = 16, 32, 64), and method. |