Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles
Authors: Yann Ollivier, Ludovic Arnold, Anne Auger, Nikolaus Hansen
JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We tested the resulting IGO trajectories on a simple objective function with two optima on {0, 1}π, namely, the two-min function based at y defined as πy(x) = minπ |π₯π π¦π| ,π |(1 π₯π) π¦π)| which, as a function of x, has two optima, one at x = y and the other at its binary complement x = y. [...] We ran both the IGO algorithm as described above, and a version using the vanilla gradient instead of the natural gradient (that is, omitting the Fisher matrix in the IGO update). [...] Figure 3 shows ten random runs (out of 300 in our experiments) of the two algorithms: for each of the two optima we plot its distance to the nearest of the points drawn from πππ‘, as a function of time π‘. |
| Researcher Affiliation | Academia | Yann Ollivier EMAIL CNRS & LRI (UMR 8623), UniversitΓ© Paris-Saclay 91405 Orsay, France / Ludovic Arnold EMAIL Univ. Paris-Sud, LRI 91405 Orsay, France / Anne Auger EMAIL Nikolaus Hansen EMAIL Inria & CMAP, Ecole polytechnique 91128 Palaiseau, France |
| Pseudocode | No | Definition 5 (IGO algorithms) The IGO algorithm associated with parametrization π, sample size πand step size πΏπ‘is the following update rule for the parameter ππ‘. At each step, πsample points π₯1, . . . , π₯πare drawn according to the distribution πππ‘. The parameter is updated according to ππ‘+πΏπ‘= ππ‘+ πΏπ‘π=1 π€π πln ππ(π₯π) π=ππ‘ (16) = ππ‘+ πΏπ‘πΌ 1(ππ‘)π=1 π€π ln ππ(π₯π) (17) |
| Open Source Code | Yes | The code used for these experiments can be found at http://www.ludovicarnold.com/projects:igocode . |
| Open Datasets | Yes | We tested the resulting IGO trajectories on a simple objective function with two optima on {0, 1}π, namely, the two-min function based at y defined as πy(x) = minπ |π₯π π¦π| ,π |(1 π₯π) π¦π)| which, as a function of x, has two optima, one at x = y and the other at its binary complement x = y. The value of the base point y was randomized for each independent run. We ran both the IGO algorithm as described above, and a version using the vanilla gradient instead of the natural gradient (that is, omitting the Fisher matrix in the IGO update). The dimension was π= 40 and we used an RBM with only one latent variable (πβ= 1). |
| Dataset Splits | No | We tested the resulting IGO trajectories on a simple objective function with two optima on {0, 1}π, namely, the two-min function based at y defined as πy(x) = minπ |π₯π π¦π| ,π |(1 π₯π) π¦π)| which, as a function of x, has two optima, one at x = y and the other at its binary complement x = y. The value of the base point y was randomized for each independent run. The experiment uses a synthetic objective function, not a dataset that typically requires train/test/validation splits. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, memory, or cloud instances) are provided in the paper for running the experiments. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) are mentioned in the paper. |
| Experiment Setup | Yes | The dimension was π= 40 and we used an RBM with only one latent variable (πβ= 1). [...] We used a large sample size of π= 10, 000 for Monte Carlo sampling, so as to be close to the theoretical IGO flow behavior. We also tested a smaller, more realistic sample size of π= 10 (still keeping πFish = 10, 000), with similar but noisier results. The selection scheme (Section 2.2) was π€(π) = 1π 1/5 (cf. Rechenberg 1994) so that the best 20% points in the sample are given weight 1 for the update. The RBM was initialized so that at startup, the distribution ππ0 is close to uniform on (x, h), in line with Proposition 2. Explicitly we set π€ππ π©(0, 1 π.πβ) and then ππ π π€ππ 2 and ππ π π€ππ 2 + π©(0, 0.01 π2 ) which ensure a close-to-uniform initial distribution. Full experimental details, including detailed setup and additional results, can be found in a previous version of this article (Ollivier et al., 2011, Section 5). |