reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles

Authors: Yann Ollivier, Ludovic Arnold, Anne Auger, Nikolaus Hansen

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We tested the resulting IGO trajectories on a simple objective function with two optima on {0, 1}𝑑, namely, the two-min function based at y defined as 𝑓y(x) = min𝑖 \|𝑥𝑖 𝑦𝑖\| ,𝑖 \|(1 𝑥𝑖) 𝑦𝑖)\| which, as a function of x, has two optima, one at x = y and the other at its binary complement x = y. [...] We ran both the IGO algorithm as described above, and a version using the vanilla gradient instead of the natural gradient (that is, omitting the Fisher matrix in the IGO update). [...] Figure 3 shows ten random runs (out of 300 in our experiments) of the two algorithms: for each of the two optima we plot its distance to the nearest of the points drawn from 𝑃𝜃𝑡, as a function of time 𝑡.
Researcher Affiliation	Academia	Yann Ollivier EMAIL CNRS & LRI (UMR 8623), Université Paris-Saclay 91405 Orsay, France / Ludovic Arnold EMAIL Univ. Paris-Sud, LRI 91405 Orsay, France / Anne Auger EMAIL Nikolaus Hansen EMAIL Inria & CMAP, Ecole polytechnique 91128 Palaiseau, France
Pseudocode	No	Definition 5 (IGO algorithms) The IGO algorithm associated with parametrization 𝜃, sample size 𝑁and step size 𝛿𝑡is the following update rule for the parameter 𝜃𝑡. At each step, 𝑁sample points 𝑥1, . . . , 𝑥𝑁are drawn according to the distribution 𝑃𝜃𝑡. The parameter is updated according to 𝜃𝑡+𝛿𝑡= 𝜃𝑡+ 𝛿𝑡𝑖=1 𝑤𝑖 𝜃ln 𝑃𝜃(𝑥𝑖) 𝜃=𝜃𝑡 (16) = 𝜃𝑡+ 𝛿𝑡𝐼 1(𝜃𝑡)𝑖=1 𝑤𝑖 ln 𝑃𝜃(𝑥𝑖) (17)
Open Source Code	Yes	The code used for these experiments can be found at http://www.ludovicarnold.com/projects:igocode .
Open Datasets	Yes	We tested the resulting IGO trajectories on a simple objective function with two optima on {0, 1}𝑑, namely, the two-min function based at y defined as 𝑓y(x) = min𝑖 \|𝑥𝑖 𝑦𝑖\| ,𝑖 \|(1 𝑥𝑖) 𝑦𝑖)\| which, as a function of x, has two optima, one at x = y and the other at its binary complement x = y. The value of the base point y was randomized for each independent run. We ran both the IGO algorithm as described above, and a version using the vanilla gradient instead of the natural gradient (that is, omitting the Fisher matrix in the IGO update). The dimension was 𝑑= 40 and we used an RBM with only one latent variable (𝑑ℎ= 1).
Dataset Splits	No	We tested the resulting IGO trajectories on a simple objective function with two optima on {0, 1}𝑑, namely, the two-min function based at y defined as 𝑓y(x) = min𝑖 \|𝑥𝑖 𝑦𝑖\| ,𝑖 \|(1 𝑥𝑖) 𝑦𝑖)\| which, as a function of x, has two optima, one at x = y and the other at its binary complement x = y. The value of the base point y was randomized for each independent run. The experiment uses a synthetic objective function, not a dataset that typically requires train/test/validation splits.
Hardware Specification	No	No specific hardware details (like GPU/CPU models, memory, or cloud instances) are provided in the paper for running the experiments.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) are mentioned in the paper.
Experiment Setup	Yes	The dimension was 𝑑= 40 and we used an RBM with only one latent variable (𝑑ℎ= 1). [...] We used a large sample size of 𝑁= 10, 000 for Monte Carlo sampling, so as to be close to the theoretical IGO flow behavior. We also tested a smaller, more realistic sample size of 𝑁= 10 (still keeping 𝑁Fish = 10, 000), with similar but noisier results. The selection scheme (Section 2.2) was 𝑤(𝑞) = 1𝑞 1/5 (cf. Rechenberg 1994) so that the best 20% points in the sample are given weight 1 for the update. The RBM was initialized so that at startup, the distribution 𝑃𝜃0 is close to uniform on (x, h), in line with Proposition 2. Explicitly we set 𝑤𝑖𝑗 𝒩(0, 1 𝑑.𝑑ℎ) and then 𝑏𝑗 𝑖 𝑤𝑖𝑗 2 and 𝑎𝑖 𝑗 𝑤𝑖𝑗 2 + 𝒩(0, 0.01 𝑑2 ) which ensure a close-to-uniform initial distribution. Full experimental details, including detailed setup and additional results, can be found in a previous version of this article (Ollivier et al., 2011, Section 5).