Distributed Retraction-Free and Communication-Efficient Optimization on the Stiefel Manifold
Authors: Yilong Song, Peijin Li, Bin Gao, Kun Yuan
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive numerical experiments validate our theoretical results. To validate the performance of EF-Landing, we provide experiments on two groups of problems: the distributed online PCA for deterministic scenario and deep learning using Res Net-18 (He et al., 2016) neural network architecture with orthogonal constraints applied to the convolutional layers for stochastic scenario. |
| Researcher Affiliation | Academia | 1Peking University 2Academy of Mathematics and Systems Science, Chinese Academy of Sciences. Correspondence to: Kun Yuan <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 EF-Landing |
| Open Source Code | No | All the experiments were implemented in Py Torch and performed using a single GPU. Further experiments can be found in Appendix D. |
| Open Datasets | Yes | We tested the performance of EF-Landing on the CIFAR-10 dataset... The MNIST dataset... The CIFAR-10 dataset... It requires citation (Krizhevsky et al., 2009) for usage. The MNIST dataset only has a citation requirement (Le Cun et al., 2010). |
| Dataset Splits | Yes | The MNIST dataset... containing 60,000 training data samples and 10,000 test data samples. The CIFAR-10 dataset... is comprised of 50,000 training samples and 10,000 testing samples. |
| Hardware Specification | No | All the experiments were implemented in Py Torch and performed using a single GPU. |
| Software Dependencies | No | All the experiments were implemented in Py Torch and performed using a single GPU. |
| Experiment Setup | Yes | For EF-Landing algorthm, the penalty parameter λ was set to 1, and we used three compressors: Top-K, Rand-K with compression retention ratio 0.1 and QSGD with quantization level s = 16. The experiment involved 600 iterations for all algorithms with a fixed learning rate... For each algorithm, the network is trained for 150 epochs, and the learning rate is reduced to 0.1 of its original value after the 100th epoch. Additional experimental details, including the choice of hyper-parameters such as momentum and step size, can be found in Appendix D. |