Long-Context Linear System Identification

Authors: Oğuz Kaan Yüksel, Mathieu Even, Nicolas Flammarion

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we confirm these statistical rates through experiments that verify the scaling laws predicted by problem parameters. Due to space constraints, these experiments are provided in Section E. ... All experiments in this section are implemented with Python 3 (Van Rossum & Drake, 2009) under PSF license and Py Torch (Paszke et al., 2019) under BSD-3-Clause license. In addition, we use Num Py (Harris et al., 2020) under BSD license. For all the experiments, A is generated as follows. First, p orthogonal matrices of shape d d are sampled... Figure 1 plots the estimation error for d {5, 10, 15}, p {5, 10, 15}, N {1, 5, 10} and T {1, 5, 10, 25, 50} pdr/N. The upper bound in Theorem 4.1 scales with the ratio β/γ up to logarithmic terms as empirically verified by Figure 1.
Researcher Affiliation Academia O guz Kaan Y uksel EPFL Lausanne, Switzerland Mathieu Even Inria ENS Paris, France Nicolas Flammarion EPFL Lausanne, Switzerland
Pseudocode No The paper describes methods and derivations mathematically and explains experimental procedures in text, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper states that experiments are implemented using Python 3, PyTorch, and NumPy, but there is no explicit statement about releasing the authors' own source code for the methodology described, nor a link to a code repository.
Open Datasets No The paper describes a data generation process for its experiments: 'For all the experiments, A is generated as follows. First, p orthogonal matrices of shape d d are sampled. These are then scaled down by α p where α is arbitrarily set to 0.5. In cases where A needs to be initialized, we use the same recipe for the student model with p instead of p and set α = 1.' There is no mention of using publicly available or open datasets.
Dataset Splits No The paper describes a synthetic data generation process where 'N independent sequences of length T > p' are generated for system identification. However, it does not explicitly mention traditional training, validation, or test dataset splits, as the focus is on estimating known ground-truth parameters from generated trajectories rather than evaluating generalization on separate data partitions.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, memory, or cloud instance specifications used for running the experiments. It only mentions the software used for implementation.
Software Dependencies No The paper states that experiments are implemented with 'Python 3', 'Py Torch', and 'Num Py'. While 'Python 3' indicates a major version, specific minor versions for Python and precise version numbers for PyTorch and NumPy are not provided.
Experiment Setup Yes For all the experiments, A is generated as follows. First, p orthogonal matrices of shape d d are sampled. These are then scaled down by α p where α is arbitrarily set to 0.5. In cases where A needs to be initialized, we use the same recipe for the student model with p instead of p and set α = 1. For Theorems 4.1 and 4.3, ˆA is computed with the OLS estimator and for Theorem 4.2, ˆA is learned with gradient descent with learning rate α on the group-norm regularized loss in Equation (9). The parameter λ and learning rate α are tuned by a grid search. ... λ 10 1, 10 2, 10 3, 10 4, 10 5, 10 6, 10 7 and learning rate α 10 1, 10 2, 10 3.