Layered-Parameter Perturbation for Zeroth-Order Optimization of Optical Neural Networks

Authors: Hiroshi Sawada, Kazuo Aoyama, Masaya Notomi

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that the proposed method using the special covariance matrix significantly outperformed conventional methods. Section 5 shows the experimental results.
Researcher Affiliation Industry 1Communication Science Laboratories, NTT Corporation, Japan 2Basic Research Laboratories, NTT Corporation, Japan EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1: ZO optimization with layered-parameter perturbations
Open Source Code No The paper does not contain any explicit statement about making the source code available or provide a link to a code repository.
Open Datasets Yes We used the MNIST (Le Cun and Cortes 2010) and Fashion MNIST (Xiao, Rasul, and Vollgraf 2017) as image datasets, which were of appropriate difficulties for ONN circuits without memory.
Dataset Splits No The paper mentions using MNIST and Fashion MNIST datasets and reports test accuracies, implying the use of standard splits. However, it does not explicitly state the specific train/test/validation percentages or sample counts within the text.
Hardware Specification Yes We built our customized CUDA kernels (Luebke 2008) for computational acceleration, and ran the program on an NVIDIA RTX A6000 (48 GB) as the GPU.
Software Dependencies No We employed Py Torch (Paszke et al. 2019) to simulate and train ONNs. We built our customized CUDA kernels (Luebke 2008) for computational acceleration, and ran the program on an NVIDIA RTX A6000 (48 GB) as the GPU. While PyTorch and CUDA are mentioned, specific version numbers are not provided for either.
Experiment Setup Yes Table 2: Hyperparameter settings Mini-batch size B = 100 Number of perturbation vectors Q = K Scale (ZO optimization) λ = 1/N Smoothing (ZO optimization) µ = 0.001/sqrt(N) Exponential smoothing of Fu α = 0.01 Regularizing weight for Σu ρ = 0.1 Update interval of Fu and Σu Tud = 100 Number of random input vectors Rin = 100 Num. of output perturbation vectors Rout = 100 The learning rate η of Adam and the step-size sigma0 of CMA-ES were optimized by using Optuna (Akiba et al. 2019) for each combination of task, dimensionality (K = 16, 32, 64), and method.