reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Gradient Aligned Regression via Pairwise Losses

Authors: Dixian Zhu, Tianbao Yang, Livnat Jerby

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of the proposed method practically on two synthetic datasets and on eight extensive real-world tasks from six benchmark datasets with other eight competitive baselines. Running time experiments demonstrate the superior efficiency... Additionally, ablation studies confirm the effectiveness of each component of GAR.
Researcher Affiliation	Academia	1Department of Genetics, Stanford University, CA, USA 2Department of Computer Science, Texas A&M University, TX, USA. Correspondence to: Dixian Zhu <EMAIL>, Livnat Jerby <EMAIL>.
Pseudocode	Yes	Algorithm 1 Gradient Aligned Regression (GAR) Require: hyper-parameter α for balancing sub-losses, training dataset D = {(xi, yi)N i=1}. Initialize model f( ). for t = 1 to T do Sample mini-batch of data {(xi, yi)i Bt}. Compute MAE loss LMAE c . Compute the losses for derivative: LMSE diff and Lp=2 diffnorm by E.q. 5 and E.q. 7. Compute GAR (KL) loss LKL GAR by E.q. 11. Utilize SGD or Adam optimizers to optimize model with gradient f LKL GAR. end for
Open Source Code	Yes	The code is open sourced at https://github.com/DixianZhu/GAR.
Open Datasets	Yes	Concrete Compressive Strength (Yeh, 1998): predicting the compressive strength of high-performance concrete. 2) Wine Quality (Cortez et al., 2009): predicting wine quality based on physicochemical test values (such as acidity, sugar, chlorides, etc). 3) Parkinson (Total) (Tsanas et al., 2009)... 5) Super Conductivity (Hamidieh, 2018)... 6) IC50 (Garnett et al., 2012)... 7) Age DB (Scratch) (Moschoglou et al., 2017)
Dataset Splits	Yes	For tabular datasets, we uniformly randomly split 20% data as testing; the remaining 80% as training and validation, where we conduct 5-fold-cross-validation with random seed set as 123.
Hardware Specification	Yes	We run all compared methods sequentially on an exclusive cluster node with AMD EPYC 7402 24-Core Processor 2.0 GHz.
Software Dependencies	No	The paper mentions using 'Adam optimizer' and 'SGD with momentum' and specifies various hyperparameters for different methods but does not provide specific version numbers for software libraries or frameworks (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	The total training epochs is set as 100 and batch size is set as 256. The weight decay for each method is tuned in {1e-3, 1e-4, 1e-5}; we utilize SGD with momentum (set as 0.9) optimizer and tune the initial learning rate for baseline method in {1e-1, 1e-2, 1e-3, 1e-4, 1e-5}, which is stage-wised decreased by 10 folds at the end of 50-th and 75-th epoch. The switching hyper-parameter δ for Huber loss and the scaling hyper-parameter β for Focal (MAE) or Focal (MSE) loss are tuned in {0.25, 1, 4}. The interpolation hyper-parameter λ for Rank Sim is tuned in {0.5, 1, 2} and the balancing hyper-parameter γ is fixed as 100 as suggested by their sensitivity study in their Appendix C.4 (Gong et al., 2022). The temperature hyper-parameter for RNC is tuned in {1,2,4}; the first 50 epochs are used for RNC pre-training and the remaining 50 epochs are used for fine-tuning with MAE loss. The linear combination hyper-parameters α is fixed as 1, β is tuned in {0.2, 1, 4} for Con R, as suggested by the ablation studies in their Appendix A.5 (Keramati et al., 2023). The robust reconciliation hyper-parameter α for GAR is tuned in {0.1, 1, 10}.