reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems

Authors: Dhruv Malik, Ashwin Pananjady, Kush Bhatia, Koulik Khamaru, Peter L. Bartlett, Martin J. Wainwright

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our theory is corroborated by simulations of derivative-free methods in application to these systems. Along the way, we derive convergence rates for stochastic zero-order optimization algorithms when applied to a certain class of non-convex problems. ... Figure 2 shows the convergence rate of the algorithm in all three settings as a function of ϵ, where we conﬁrm that scalings in practice corroborate our theory quite accurately.
Researcher Affiliation	Collaboration	Dhruv Malik EMAIL Ashwin Pananjady EMAIL Kush Bhatia EMAIL Koulik Khamaru EMAIL Peter L. Bartlett , EMAIL Martin J. Wainwright , , EMAIL Machine Learning Department, Carnegie Mellon University Departments of Electrical Engineering & Computer Sciences and Statistics , University of California Berkeley Voleon Group , Berkeley
Pseudocode	Yes	Algorithm 1 Stochastic Zero-Order Method 1: Given iteration number T 1, initial point x0 X, step size η > 0 and smoothing radius r > 0 2: for t {0, 1, . . . , T 1} do 3: Sample ξt D and ut Unif(Sd 1) 4: Compute g(xt) as 5: xt+1 xt ηg(xt) return x T
Open Source Code	No	The paper does not explicitly provide a link to source code, state that code is available in supplementary materials, or affirm that code for the described methodology is being released.
Open Datasets	No	The paper defines its own LQR problem instances for simulation within Appendix D, rather than utilizing external, publicly available datasets requiring concrete access information like links or citations. For example: "To generate the plot in Figure 1 (a), we used the following one dimensional LQR problem: A = 5, B = 0.33, Q = 1, R = 1" and "We randomly generated A, B, Q and R as 8 8 matrices."
Dataset Splits	No	The paper conducts simulation experiments based on defined LQR problems and does not involve typical machine learning datasets that would require explicit training, validation, or test splits. The experimental setup describes how problem parameters and initial conditions are chosen for simulation runs, not data partitioning.
Hardware Specification	No	Appendix D 'Experimental Details & Additional Experiments' describes the LQR problems used, how initial conditions were chosen, and general experimental parameters like step size and rollout length. However, it does not mention any specific hardware (e.g., GPU/CPU models, memory, cloud instances) used for running the simulations.
Software Dependencies	No	The paper does not mention any specific software or library names with version numbers that were used in the implementation of the algorithms or experiments.
Experiment Setup	Yes	For each LQR problem used, the initial K0 was picked by randomly perturbing the entries of K . The step size was tuned manually and the smoothing radius was always chosen to be the minimum of ϵ and the largest value required to ensure stability. The rollout length was also tuned manually until the cost from a rollout converged arbitrarily close to the true value. ... For three diﬀerent values of C(K0), we picked 8 evenly spaced (logarithmic scale) values of ϵ in the interval (0.005, 1).