Gradient Aligned Regression via Pairwise Losses
Authors: Dixian Zhu, Tianbao Yang, Livnat Jerby
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of the proposed method practically on two synthetic datasets and on eight extensive real-world tasks from six benchmark datasets with other eight competitive baselines. Running time experiments demonstrate the superior efficiency... Additionally, ablation studies confirm the effectiveness of each component of GAR. |
| Researcher Affiliation | Academia | 1Department of Genetics, Stanford University, CA, USA 2Department of Computer Science, Texas A&M University, TX, USA. Correspondence to: Dixian Zhu <EMAIL>, Livnat Jerby <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Gradient Aligned Regression (GAR) Require: hyper-parameter α for balancing sub-losses, training dataset D = {(xi, yi)N i=1}. Initialize model f( ). for t = 1 to T do Sample mini-batch of data {(xi, yi)i Bt}. Compute MAE loss LMAE c . Compute the losses for derivative: LMSE diff and Lp=2 diffnorm by E.q. 5 and E.q. 7. Compute GAR (KL) loss LKL GAR by E.q. 11. Utilize SGD or Adam optimizers to optimize model with gradient f LKL GAR. end for |
| Open Source Code | Yes | The code is open sourced at https://github.com/DixianZhu/GAR. |
| Open Datasets | Yes | Concrete Compressive Strength (Yeh, 1998): predicting the compressive strength of high-performance concrete. 2) Wine Quality (Cortez et al., 2009): predicting wine quality based on physicochemical test values (such as acidity, sugar, chlorides, etc). 3) Parkinson (Total) (Tsanas et al., 2009)... 5) Super Conductivity (Hamidieh, 2018)... 6) IC50 (Garnett et al., 2012)... 7) Age DB (Scratch) (Moschoglou et al., 2017) |
| Dataset Splits | Yes | For tabular datasets, we uniformly randomly split 20% data as testing; the remaining 80% as training and validation, where we conduct 5-fold-cross-validation with random seed set as 123. |
| Hardware Specification | Yes | We run all compared methods sequentially on an exclusive cluster node with AMD EPYC 7402 24-Core Processor 2.0 GHz. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' and 'SGD with momentum' and specifies various hyperparameters for different methods but does not provide specific version numbers for software libraries or frameworks (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | The total training epochs is set as 100 and batch size is set as 256. The weight decay for each method is tuned in {1e-3, 1e-4, 1e-5}; we utilize SGD with momentum (set as 0.9) optimizer and tune the initial learning rate for baseline method in {1e-1, 1e-2, 1e-3, 1e-4, 1e-5}, which is stage-wised decreased by 10 folds at the end of 50-th and 75-th epoch. The switching hyper-parameter δ for Huber loss and the scaling hyper-parameter β for Focal (MAE) or Focal (MSE) loss are tuned in {0.25, 1, 4}. The interpolation hyper-parameter λ for Rank Sim is tuned in {0.5, 1, 2} and the balancing hyper-parameter γ is fixed as 100 as suggested by their sensitivity study in their Appendix C.4 (Gong et al., 2022). The temperature hyper-parameter for RNC is tuned in {1,2,4}; the first 50 epochs are used for RNC pre-training and the remaining 50 epochs are used for fine-tuning with MAE loss. The linear combination hyper-parameters α is fixed as 1, β is tuned in {0.2, 1, 4} for Con R, as suggested by the ablation studies in their Appendix A.5 (Keramati et al., 2023). The robust reconciliation hyper-parameter α for GAR is tuned in {0.1, 1, 10}. |