Proximal Distance Algorithms: Theory and Practice
Authors: Kevin L. Keys, Hua Zhou, Kenneth Lange
JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our numerical examples demonstrate the utility of proximal distance algorithms in various high-dimensional settings, including a) linear programming, b) constrained least squares, c) projection to the closest kinship matrix, d) projection onto a second-order cone constraint, e) calculation of Horn s copositive matrix index, f) linear complementarity programming, and g) sparse principal components analysis. The proximal distance algorithm in each case is competitive or superior in speed to traditional methods such as the interior point method and the alternating direction method of multipliers (ADMM). |
| Researcher Affiliation | Academia | Kevin L. Keys EMAIL Department of Medicine University of California, San Francisco CA 94158, USA; Hua Zhou EMAIL Department of Biostatistics University of California, Los Angeles CA 90095-1772, USA; Kenneth Lange EMAIL Departments of Biomathematics, Human Genetics, and Statistics University of California, Los Angeles CA 90095-1766, USA |
| Pseudocode | Yes | ALGORITHM 1: A typical proximal distance algorithm |
| Open Source Code | Yes | Source code for the numerical examples can be found at https://github.com/klkeys/proxdist. |
| Open Datasets | Yes | The breast cancer data from PMA provide the data matrix X. The data consist of p = 19, 672 RNA measurements on n = 89 patients. The two figures show computation times and the proportion of variance explained (PVE) by the p q loading matrix U. For sparse PCA, PVE is defined as tr(Xt q Xq)/ tr(Xt X), where Xq = XU(U t U) 1U t (Shen and Huang, 2008). When the loading vectors of U are orthogonal, this criterion reduces to the familiar definition tr(U t Xt XU)/ tr(Xt X) of PVE for ordinary PCA. The proximal distance algorithm enforces either matrix-wise or column-wise sparsity. In contrast, SPC enforces only column-wise sparsity via the constraint ui 1 c for each column ui of U. We take c = 8. The number of nonzeroes per loading vector output by SPC dictates the sparsity level for the column-wise version of the proximal distance algorithm. Summing these counts across all columns dictates the sparsity level for the matrix version of the proximal distance algorithm. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits. While it mentions using data like "breast cancer data from PMA" or generating data with "standard normal deviates", it does not specify how this data is partitioned for experiments (e.g., percentages for train/test/validation sets, or specific methodology for splits). |
| Hardware Specification | No | The paper mentions CPU times for various algorithms and problem sizes, indicating that computational experiments were performed. However, it does not provide any specific details about the hardware used (e.g., CPU models, GPU models, memory, or cloud instances) on which these timings were measured. |
| Software Dependencies | No | All of our examples are coded in the Julia programming language. Whenever possible, competing software was run in the Julia environment via the Julia module Math Prog Base (Dunning et al., 2017; Lubin and Dunning, 2015). The sparse PCA problem relies on the software of Witten et al. (Witten et al., 2009), which is coded in R. The paper mentions various software and programming languages (Julia, Math Prog Base, R, SCS, Gurobi, Ipopt, Mosek) but generally lacks specific version numbers for these dependencies. |
| Experiment Setup | Yes | Convergence is tested at iteration k by the two criteria |f(xk) f(xk 1)| ϵ1[|f(xk 1)| + 1] and dist(xk, C) ϵ2, where ϵ1 = 10 6 and ϵ2 = 10 4 are typical values. For dense problems, the proximal distance algorithms start the penalty constant ρ at 1 and double it every 100 iterations. For sparse problems the proximal distance algorithms update ρ by a factor of 1.5 every 50 iterations. For the proximal distance algorithm, we start ρ at 1 and multiply it by 1.5 every 200 iterations. In algorithm PD1 we set ρk = max{1.2k, 222}. In the accelerated versions PD2 and PD3 we started ρ at 1 and multiplied it by 5 every 100 iterations. Dense problems converge quickly and accurately when we set ρ0 = 1 and double ρ every 100 iterations. Sparse problems require a greater range and faster updates of ρ, so we set ρ0 = 0.01 and then multiply ρ by 2.5 every 10 iterations. The choice ρk = 1.2k converges in 600 to 700 iterations from random starting points and reliably yields objective values below 10 5 for Horn matrices. |