Joint Gradient Balancing for Data Ordering in Finite-Sum Multi-Objective Optimization

Authors: Hansi Yang, James Kwok

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluation across various datasets with different multi-objective optimization algorithms further demonstrates that Jo GBa can achieve faster convergence and superior final performance than other data ordering strategies. ... Figure 2: Training losses (objective values) of different tasks on NYUv2 data with different data ordering methods. ... Table 1: Test performance (averaged over 3 random seeds) for the three tasks on NYUv2 data.
Researcher Affiliation Academia Hansi Yang, James Kwok Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China EMAIL
Pseudocode Yes Algorithm 1 Jo GBa: Joint Gradient Balancing for Multi-Objective Optimization. ... Algorithm 2 Online greedy implementation of Balancing(s, gm,k,t).
Open Source Code No The paper does not contain any explicit statement about making the code available, nor does it provide a link to a code repository.
Open Datasets Yes We consider two data sets that are commonly used for multi-objective optimization in machine learning: (i) NYUv2 (Silberman et al., 2012), an indoor scene data set that involves three different tasks: semantic segmentation, depth estimation, and surface normal prediction. (ii) QM9 (Ramakrishnan et al., 2014), which is a widely used benchmark for graph neural networks predicting 11 properties of molecules.
Dataset Splits No The paper mentions using NYUv2 and QM9 datasets but does not explicitly describe the training, validation, and test splits used for these datasets. It refers to Appendix B for setup details, but Appendix B does not specify splits either.
Hardware Specification Yes All experiments are conducted on a server with an Intel Xeon Gold 6342 CPU and an NVIDIA RTX A6000 GPU.
Software Dependencies Yes We use the PyTorch version 1.10.1 with CUDA version 11.7.
Experiment Setup Yes Each method is trained for 200 epochs with the Adam optimizer (Kingma & Ba, 2015). We set the learning rate α = 1 10 4 at the beginning of training, and reduce it to 5 10 5 after 100 epochs. The batch size is set to 2 for all methods. ... Each method is trained for 300 epochs with the Adam optimizer (Kingma & Ba, 2015) and we set the learning rate α = 1 10 4 throughout the whole training process. The batch size is set to 120 for all methods.