Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles

Authors: Yann Ollivier, Ludovic Arnold, Anne Auger, Nikolaus Hansen

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We tested the resulting IGO trajectories on a simple objective function with two optima on {0, 1}𝑑, namely, the two-min function based at y defined as 𝑓y(x) = min𝑖 |π‘₯𝑖 𝑦𝑖| ,𝑖 |(1 π‘₯𝑖) 𝑦𝑖)| which, as a function of x, has two optima, one at x = y and the other at its binary complement x = y. [...] We ran both the IGO algorithm as described above, and a version using the vanilla gradient instead of the natural gradient (that is, omitting the Fisher matrix in the IGO update). [...] Figure 3 shows ten random runs (out of 300 in our experiments) of the two algorithms: for each of the two optima we plot its distance to the nearest of the points drawn from π‘ƒπœƒπ‘‘, as a function of time 𝑑.
Researcher Affiliation Academia Yann Ollivier EMAIL CNRS & LRI (UMR 8623), UniversitΓ© Paris-Saclay 91405 Orsay, France / Ludovic Arnold EMAIL Univ. Paris-Sud, LRI 91405 Orsay, France / Anne Auger EMAIL Nikolaus Hansen EMAIL Inria & CMAP, Ecole polytechnique 91128 Palaiseau, France
Pseudocode No Definition 5 (IGO algorithms) The IGO algorithm associated with parametrization πœƒ, sample size 𝑁and step size 𝛿𝑑is the following update rule for the parameter πœƒπ‘‘. At each step, 𝑁sample points π‘₯1, . . . , π‘₯𝑁are drawn according to the distribution π‘ƒπœƒπ‘‘. The parameter is updated according to πœƒπ‘‘+𝛿𝑑= πœƒπ‘‘+ 𝛿𝑑𝑖=1 𝑀𝑖 πœƒln π‘ƒπœƒ(π‘₯𝑖) πœƒ=πœƒπ‘‘ (16) = πœƒπ‘‘+ 𝛿𝑑𝐼 1(πœƒπ‘‘)𝑖=1 𝑀𝑖 ln π‘ƒπœƒ(π‘₯𝑖) (17)
Open Source Code Yes The code used for these experiments can be found at http://www.ludovicarnold.com/projects:igocode .
Open Datasets Yes We tested the resulting IGO trajectories on a simple objective function with two optima on {0, 1}𝑑, namely, the two-min function based at y defined as 𝑓y(x) = min𝑖 |π‘₯𝑖 𝑦𝑖| ,𝑖 |(1 π‘₯𝑖) 𝑦𝑖)| which, as a function of x, has two optima, one at x = y and the other at its binary complement x = y. The value of the base point y was randomized for each independent run. We ran both the IGO algorithm as described above, and a version using the vanilla gradient instead of the natural gradient (that is, omitting the Fisher matrix in the IGO update). The dimension was 𝑑= 40 and we used an RBM with only one latent variable (π‘‘β„Ž= 1).
Dataset Splits No We tested the resulting IGO trajectories on a simple objective function with two optima on {0, 1}𝑑, namely, the two-min function based at y defined as 𝑓y(x) = min𝑖 |π‘₯𝑖 𝑦𝑖| ,𝑖 |(1 π‘₯𝑖) 𝑦𝑖)| which, as a function of x, has two optima, one at x = y and the other at its binary complement x = y. The value of the base point y was randomized for each independent run. The experiment uses a synthetic objective function, not a dataset that typically requires train/test/validation splits.
Hardware Specification No No specific hardware details (like GPU/CPU models, memory, or cloud instances) are provided in the paper for running the experiments.
Software Dependencies No No specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) are mentioned in the paper.
Experiment Setup Yes The dimension was 𝑑= 40 and we used an RBM with only one latent variable (π‘‘β„Ž= 1). [...] We used a large sample size of 𝑁= 10, 000 for Monte Carlo sampling, so as to be close to the theoretical IGO flow behavior. We also tested a smaller, more realistic sample size of 𝑁= 10 (still keeping 𝑁Fish = 10, 000), with similar but noisier results. The selection scheme (Section 2.2) was 𝑀(π‘ž) = 1π‘ž 1/5 (cf. Rechenberg 1994) so that the best 20% points in the sample are given weight 1 for the update. The RBM was initialized so that at startup, the distribution π‘ƒπœƒ0 is close to uniform on (x, h), in line with Proposition 2. Explicitly we set 𝑀𝑖𝑗 𝒩(0, 1 𝑑.π‘‘β„Ž) and then 𝑏𝑗 𝑖 𝑀𝑖𝑗 2 and π‘Žπ‘– 𝑗 𝑀𝑖𝑗 2 + 𝒩(0, 0.01 𝑑2 ) which ensure a close-to-uniform initial distribution. Full experimental details, including detailed setup and additional results, can be found in a previous version of this article (Ollivier et al., 2011, Section 5).