Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems
Authors: Dhruv Malik, Ashwin Pananjady, Kush Bhatia, Koulik Khamaru, Peter L. Bartlett, Martin J. Wainwright
JMLR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our theory is corroborated by simulations of derivative-free methods in application to these systems. Along the way, we derive convergence rates for stochastic zero-order optimization algorithms when applied to a certain class of non-convex problems. ... Figure 2 shows the convergence rate of the algorithm in all three settings as a function of ϵ, where we confirm that scalings in practice corroborate our theory quite accurately. |
| Researcher Affiliation | Collaboration | Dhruv Malik EMAIL Ashwin Pananjady EMAIL Kush Bhatia EMAIL Koulik Khamaru EMAIL Peter L. Bartlett , EMAIL Martin J. Wainwright , , EMAIL Machine Learning Department, Carnegie Mellon University Departments of Electrical Engineering & Computer Sciences and Statistics , University of California Berkeley Voleon Group , Berkeley |
| Pseudocode | Yes | Algorithm 1 Stochastic Zero-Order Method 1: Given iteration number T 1, initial point x0 X, step size η > 0 and smoothing radius r > 0 2: for t {0, 1, . . . , T 1} do 3: Sample ξt D and ut Unif(Sd 1) 4: Compute g(xt) as 5: xt+1 xt ηg(xt) return x T |
| Open Source Code | No | The paper does not explicitly provide a link to source code, state that code is available in supplementary materials, or affirm that code for the described methodology is being released. |
| Open Datasets | No | The paper defines its own LQR problem instances for simulation within Appendix D, rather than utilizing external, publicly available datasets requiring concrete access information like links or citations. For example: "To generate the plot in Figure 1 (a), we used the following one dimensional LQR problem: A = 5, B = 0.33, Q = 1, R = 1" and "We randomly generated A, B, Q and R as 8 8 matrices." |
| Dataset Splits | No | The paper conducts simulation experiments based on defined LQR problems and does not involve typical machine learning datasets that would require explicit training, validation, or test splits. The experimental setup describes how problem parameters and initial conditions are chosen for simulation runs, not data partitioning. |
| Hardware Specification | No | Appendix D 'Experimental Details & Additional Experiments' describes the LQR problems used, how initial conditions were chosen, and general experimental parameters like step size and rollout length. However, it does not mention any specific hardware (e.g., GPU/CPU models, memory, cloud instances) used for running the simulations. |
| Software Dependencies | No | The paper does not mention any specific software or library names with version numbers that were used in the implementation of the algorithms or experiments. |
| Experiment Setup | Yes | For each LQR problem used, the initial K0 was picked by randomly perturbing the entries of K . The step size was tuned manually and the smoothing radius was always chosen to be the minimum of ϵ and the largest value required to ensure stability. The rollout length was also tuned manually until the cost from a rollout converged arbitrarily close to the true value. ... For three different values of C(K0), we picked 8 evenly spaced (logarithmic scale) values of ϵ in the interval (0.005, 1). |