reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Long-Context Linear System Identification

Authors: Oğuz Kaan Yüksel, Mathieu Even, Nicolas Flammarion

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we confirm these statistical rates through experiments that verify the scaling laws predicted by problem parameters. Due to space constraints, these experiments are provided in Section E. ... All experiments in this section are implemented with Python 3 (Van Rossum & Drake, 2009) under PSF license and Py Torch (Paszke et al., 2019) under BSD-3-Clause license. In addition, we use Num Py (Harris et al., 2020) under BSD license. For all the experiments, A is generated as follows. First, p orthogonal matrices of shape d d are sampled... Figure 1 plots the estimation error for d {5, 10, 15}, p {5, 10, 15}, N {1, 5, 10} and T {1, 5, 10, 25, 50} pdr/N. The upper bound in Theorem 4.1 scales with the ratio β/γ up to logarithmic terms as empirically verified by Figure 1.
Researcher Affiliation	Academia	O guz Kaan Y uksel EPFL Lausanne, Switzerland Mathieu Even Inria ENS Paris, France Nicolas Flammarion EPFL Lausanne, Switzerland
Pseudocode	No	The paper describes methods and derivations mathematically and explains experimental procedures in text, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper states that experiments are implemented using Python 3, PyTorch, and NumPy, but there is no explicit statement about releasing the authors' own source code for the methodology described, nor a link to a code repository.
Open Datasets	No	The paper describes a data generation process for its experiments: 'For all the experiments, A is generated as follows. First, p orthogonal matrices of shape d d are sampled. These are then scaled down by α p where α is arbitrarily set to 0.5. In cases where A needs to be initialized, we use the same recipe for the student model with p instead of p and set α = 1.' There is no mention of using publicly available or open datasets.
Dataset Splits	No	The paper describes a synthetic data generation process where 'N independent sequences of length T > p' are generated for system identification. However, it does not explicitly mention traditional training, validation, or test dataset splits, as the focus is on estimating known ground-truth parameters from generated trajectories rather than evaluating generalization on separate data partitions.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, memory, or cloud instance specifications used for running the experiments. It only mentions the software used for implementation.
Software Dependencies	No	The paper states that experiments are implemented with 'Python 3', 'Py Torch', and 'Num Py'. While 'Python 3' indicates a major version, specific minor versions for Python and precise version numbers for PyTorch and NumPy are not provided.
Experiment Setup	Yes	For all the experiments, A is generated as follows. First, p orthogonal matrices of shape d d are sampled. These are then scaled down by α p where α is arbitrarily set to 0.5. In cases where A needs to be initialized, we use the same recipe for the student model with p instead of p and set α = 1. For Theorems 4.1 and 4.3, ˆA is computed with the OLS estimator and for Theorem 4.2, ˆA is learned with gradient descent with learning rate α on the group-norm regularized loss in Equation (9). The parameter λ and learning rate α are tuned by a grid search. ... λ 10 1, 10 2, 10 3, 10 4, 10 5, 10 6, 10 7 and learning rate α 10 1, 10 2, 10 3.