Understanding Deflation Process in Over-parametrized Tensor Decomposition
Authors: Rong Ge, Yunwei Ren, Xiang Wang, Mo Zhou
NeurIPS 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We prove that for orthogonally decomposable tensor, a slightly modified version of gradient flow would follow a tensor deflation process and recover all the tensor components.Our proof suggests that for orthogonal tensors, gradient flow dynamics works similarly as greedy low-rank learning in the matrix setting, which is a first step towards understanding the implicit regularization effect of over-parametrized models for low-rank tensors. |
| Researcher Affiliation | Academia | Rong Ge Duke University EMAIL Yunwei Ren* Shanghai Jiao Tong University EMAIL Xiang Wang* Duke University EMAIL Mo Zhou* Duke University EMAIL |
| Pseudocode | Yes | Algorithm 1 Tensor Deflation Process |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available. |
| Open Datasets | No | The paper constructs a synthetic tensor (T = P i [5] aie 4 i) for illustrative purposes in Figure 1, but does not use or provide access information for any publicly available or open datasets. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments that require specifying training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not specify any hardware used for experiments, such as specific GPU or CPU models. |
| Software Dependencies | No | The paper does not mention any specific software dependencies with version numbers required to replicate the work. |
| Experiment Setup | Yes | Input: Number of components m, initialization scale δ0, re-initialization threshold δ1, increasing rate of epoch length γ, target accuracy ϵ, regularization coefficient λ" and "Theorem 1. For any ϵ exp( o(d/ log d)), there exists γ = Θ(1), m = poly(d), λ = min{O(log d/d), O(ϵ/d1/2)}), α = min{O(λ/d3/2), O(λ2), O(ϵ2/d4)}, δ1 = O(α3/2/m1/2), δ0 = Θ(δ1α/ log1/2(d)) such that with probability 1 1/poly(d) in the (re)-initializations, Algorithm 2 terminates in O(log(d/ϵ)) epochs and returns a tensor T such that |