Joint Gradient Balancing for Data Ordering in Finite-Sum Multi-Objective Optimization
Authors: Hansi Yang, James Kwok
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluation across various datasets with different multi-objective optimization algorithms further demonstrates that Jo GBa can achieve faster convergence and superior final performance than other data ordering strategies. ... Figure 2: Training losses (objective values) of different tasks on NYUv2 data with different data ordering methods. ... Table 1: Test performance (averaged over 3 random seeds) for the three tasks on NYUv2 data. |
| Researcher Affiliation | Academia | Hansi Yang, James Kwok Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China EMAIL |
| Pseudocode | Yes | Algorithm 1 Jo GBa: Joint Gradient Balancing for Multi-Objective Optimization. ... Algorithm 2 Online greedy implementation of Balancing(s, gm,k,t). |
| Open Source Code | No | The paper does not contain any explicit statement about making the code available, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We consider two data sets that are commonly used for multi-objective optimization in machine learning: (i) NYUv2 (Silberman et al., 2012), an indoor scene data set that involves three different tasks: semantic segmentation, depth estimation, and surface normal prediction. (ii) QM9 (Ramakrishnan et al., 2014), which is a widely used benchmark for graph neural networks predicting 11 properties of molecules. |
| Dataset Splits | No | The paper mentions using NYUv2 and QM9 datasets but does not explicitly describe the training, validation, and test splits used for these datasets. It refers to Appendix B for setup details, but Appendix B does not specify splits either. |
| Hardware Specification | Yes | All experiments are conducted on a server with an Intel Xeon Gold 6342 CPU and an NVIDIA RTX A6000 GPU. |
| Software Dependencies | Yes | We use the PyTorch version 1.10.1 with CUDA version 11.7. |
| Experiment Setup | Yes | Each method is trained for 200 epochs with the Adam optimizer (Kingma & Ba, 2015). We set the learning rate α = 1 10 4 at the beginning of training, and reduce it to 5 10 5 after 100 epochs. The batch size is set to 2 for all methods. ... Each method is trained for 300 epochs with the Adam optimizer (Kingma & Ba, 2015) and we set the learning rate α = 1 10 4 throughout the whole training process. The batch size is set to 120 for all methods. |