On the Computational and Statistical Complexity of Over-parameterized Matrix Sensing
Authors: Jiacheng Zhuo, Jeongyeol Kwon, Nhat Ho, Constantine Caramanis
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that when the number of samples n is sufficiently large, Ft F t X 2 F converges to a final statistical error of O kdσ2/n after O(σr d) number of iterations, namely sub-linearly... Furthermore, the number of iterations O(σr d) is needed in the over-parameterized setting as the local curvature of the loss function (1) around the global maxima is not quadratic and therefore the FGD only converges sub-linearly to the global maxima; see the simulations in Figure 1 for an illustration. In the simulations, we consider the dimension d = 20, the true rank r = 3, and the number of samples n = 200. We first generate random orthonormal matrices U and V such that the union of their column spaces is Rd. We set D S to be a diagonal matrix, with its (1, 1), (2, 2), (3, 3) entries be 1, 0.9, 0.8 respectively, and zero elsewhere. Hence X = UD SU . The upper triangle entries of the sensing matrices Ai are sampled from standard Gaussian distribution, and we fill the lower triangle entries accordingly such that Ai are symmetric. We further assume that there is no observation noise, so that we have a better understanding of the convergence behavior of the algorithm. Let {Ft}t be the sequence generated by the FGD method as in equation (3) with η = 0.1. The simulation results are shown in Figure 1. |
| Researcher Affiliation | Academia | Jiacheng Zhuo EMAIL Department of Computer Science University of Texas Austin, TX 78712, USA; Jeongyeol Kwon EMAIL Wisconsin Institute for Discovery University of Wisconsin-Madison Madison, WI 53705, USA; Nhat Ho EMAIL Department of Statistics and Data Sciences University of Texas Austin, TX 78712, USA; Constantine Caramanis EMAIL Department of Electrical and Computer Engineering University of Texas Austin, TX 78712, USA |
| Pseudocode | No | The paper does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block. It describes the Factorized Gradient Descent (FGD) method using mathematical equations and textual descriptions, for example: 'Ft+1 = Ft ηGn t , where Gn t = L(Ft) = 1 D Ai, Ft F t E yi Ai Ft, (3) where η is the step size and Gn t denotes the gradient evaluated at iteration t with n i.i.d. samples.' |
| Open Source Code | No | The paper does not provide any statement regarding the release of source code, nor does it include links to a code repository or mention code in supplementary materials. |
| Open Datasets | No | In the simulations, we consider the dimension d = 20, the true rank r = 3, and the number of samples n = 200. We first generate random orthonormal matrices U and V such that the union of their column spaces is Rd. We set D S to be a diagonal matrix, with its (1, 1), (2, 2), (3, 3) entries be 1, 0.9, 0.8 respectively, and zero elsewhere. Hence X = UD SU . The upper triangle entries of the sensing matrices Ai are sampled from standard Gaussian distribution, and we fill the lower triangle entries accordingly such that Ai are symmetric. We further assume that there is no observation noise... |
| Dataset Splits | No | The paper describes the generation of synthetic data for simulations: 'In the simulations, we consider the dimension d = 20, the true rank r = 3, and the number of samples n = 200. We first generate random orthonormal matrices U and V... The upper triangle entries of the sensing matrices Ai are sampled from standard Gaussian distribution...' However, it does not provide details on training, validation, or test splits, as the experiments are simulations to analyze convergence properties rather than model training on partitioned data. |
| Hardware Specification | No | The paper describes a simulation setup and presents results from these simulations, but it does not specify any particular hardware (e.g., GPU, CPU models, memory) used to conduct these simulations or experiments. |
| Software Dependencies | No | The paper focuses on theoretical analysis and simulation of the Factorized Gradient Descent (FGD) method. It does not provide any specific software names with version numbers, such as programming languages, libraries, or solvers used in its implementation or analysis. |
| Experiment Setup | Yes | In the simulations, we consider the dimension d = 20, the true rank r = 3, and the number of samples n = 200... Let {Ft}t be the sequence generated by the FGD method as in equation (3) with η = 0.1. |