Distributed Retraction-Free and Communication-Efficient Optimization on the Stiefel Manifold

Authors: Yilong Song, Peijin Li, Bin Gao, Kun Yuan

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive numerical experiments validate our theoretical results. To validate the performance of EF-Landing, we provide experiments on two groups of problems: the distributed online PCA for deterministic scenario and deep learning using Res Net-18 (He et al., 2016) neural network architecture with orthogonal constraints applied to the convolutional layers for stochastic scenario.
Researcher Affiliation Academia 1Peking University 2Academy of Mathematics and Systems Science, Chinese Academy of Sciences. Correspondence to: Kun Yuan <EMAIL>.
Pseudocode Yes Algorithm 1 EF-Landing
Open Source Code No All the experiments were implemented in Py Torch and performed using a single GPU. Further experiments can be found in Appendix D.
Open Datasets Yes We tested the performance of EF-Landing on the CIFAR-10 dataset... The MNIST dataset... The CIFAR-10 dataset... It requires citation (Krizhevsky et al., 2009) for usage. The MNIST dataset only has a citation requirement (Le Cun et al., 2010).
Dataset Splits Yes The MNIST dataset... containing 60,000 training data samples and 10,000 test data samples. The CIFAR-10 dataset... is comprised of 50,000 training samples and 10,000 testing samples.
Hardware Specification No All the experiments were implemented in Py Torch and performed using a single GPU.
Software Dependencies No All the experiments were implemented in Py Torch and performed using a single GPU.
Experiment Setup Yes For EF-Landing algorthm, the penalty parameter λ was set to 1, and we used three compressors: Top-K, Rand-K with compression retention ratio 0.1 and QSGD with quantization level s = 16. The experiment involved 600 iterations for all algorithms with a fixed learning rate... For each algorithm, the network is trained for 150 epochs, and the learning rate is reduced to 0.1 of its original value after the 100th epoch. Additional experimental details, including the choice of hyper-parameters such as momentum and step size, can be found in Appendix D.