Leveraging Variable Sparsity to Refine Pareto Stationarity in Multi-Objective Optimization

Authors: Zeou Hu, Yaoliang Yu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Lastly, we validate our approach through experiments on both synthetic examples and realistic application scenarios where distinct function-variable dependency structures appear. Our results highlight the importance of exploiting function-variable structure in gradient-based MOO, and provide a seamless enhancement to existing approaches. We validate the effectiveness of our approach through experiments on synthetic and benchmark datasets, showcasing its application to ML scenarios with diverse dependency structures.
Researcher Affiliation Academia University of Waterloo, Vector Institute EMAIL Yaoliang Yu University of Waterloo, Vector Institute EMAIL
Pseudocode Yes Algorithm 1: Multiple Gradient Descent Algorithm with Refined Partition (RP-MGDA) Algorithm 2: REFINED_PARTITION
Open Source Code Yes Our code is available at https://github.com/watml/RP-MGDA.
Open Datasets Yes We consider a personalized federated learning setting with m = 4 clients, each holding distinct non-i.i.d. data Di sampled from the training dataset (MNIST/CIFAR-10). Specifically, we consider an m = 4-objective learning problem on the California housing dataset (Pace and Barry, 1997)
Dataset Splits Yes In order to create a non-i.i.d. dataset, we follow a similar sampling procedure as in Mc Mahan et al. (2017): First we sort all data points according to their classes. Then, they are split consecutively into shards (300 shards for MNIST, 250 shards for CIFAR-10), with 200 images per shard, each shard contains images from only one class. Each client is randomly assigned 10 different shards, totaling 2000 instances per client. The data distribution for each client varies, with each having access to different subsets of class labels. We take a smaller subset of the original CIFAR-10 dataset as our training set. We standardize each feature to have zero mean and unit variance to improve training.
Hardware Specification No The paper does not explicitly describe the specific hardware used to run its experiments, only mentioning general experimental setups.
Software Dependencies No The paper mentions 'networkx' as a package for detecting cycles but does not provide a specific version number. No other software components with specific version numbers are listed.
Experiment Setup Yes We run simulations with step size η = 0.01 and 1000 iterations. We adopt a warmstart procedure by first running RP-MGDA for 50 epochs on a given random initialization, using the resulting weights as the starting point for both MGDA and RP-MGDA training. We also employ a periodic exponential decay learning rate scheme for the CIFAR-10 experiments. For MNIST, we run 1000 epochs with lr=0.1 for both algorithms; for CIFAR-10, we run 5000 epochs with lr=0.05 and a step decay of 0.9 every 200 epochs for both algorithms. Mini-batch size= 200.