Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
DiLQR: Differentiable Iterative Linear Quadratic Regulator via Implicit Differentiation
Authors: Shuyuan Wang, Philip D Loewen, Michael Forbes, Bhushan Gopaluni, Wei Pan
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our framework on imitation tasks on famous control benchmarks. Our analytical method demonstrates superior computational performance, achieving up to 128x speedup and a minimum of 21x speedup compared to automatic differentiation. Our method also demonstrates superior learning performance (106x) compared to traditional neural network policies and better model loss with differentiable controllers that lack exact analytical gradients. Furthermore, we integrate our module into a larger network with visual inputs to demonstrate the capacity of our method for high-dimensional, fully end-to-end tasks. |
| Researcher Affiliation | Collaboration | 1The University of British Columbia, Vancouver, Canada 2Honeywell Process Solutions, Vancouver, Canada 3The University of Manchester, Manchester, England. |
| Pseudocode | Yes | Algorithm 1 Forward Algorithm 1: Input: Dt θ , Dt 2: Initialize variables θx0 = 0 3: for time step t = 1, 2, . . . , T do 4: obtain θxt through (15) 5: obtain θDt with θxt and (14), and obtain θdt with θxt and (16) 6: end for 7: return θD, θd |
| Open Source Code | Yes | Codes can be found on the project homepage https://sites.google.com/ view/dilqr/. |
| Open Datasets | Yes | We conduct experiments on two well-known control benchmarks: Cart Pole and Inverted Pendulum. |
| Dataset Splits | No | No specific dataset split percentages or exact counts for train/validation/test sets are provided in the main text. The paper mentions "train=50 and train=100 denote the number of expert trajectories available during training," which indicates the size of the training data but not its split from a larger dataset or the sizes of validation/test sets. |
| Hardware Specification | Yes | All experiments were carried out on a platform with an AMD 3700X 3.6GHz CPU, 16GB RAM, and an RTX3080 GPU with 10GB VRAM. |
| Software Dependencies | No | The experiments are implemented with Pytorch (Paszke et al., 2019). No specific version number for PyTorch is mentioned, nor are other software dependencies with version numbers. |
| Experiment Setup | Yes | The NN setting was optimized with Adam with a learning rate of 10 4, and all other settings were optimized with RMSprop with a learning rate of 10 2 and a decay term of 0.5. |