Supervised Score-Based Modeling by Gradient Boosting

Authors: Changyuan Zhao, Hongyang Du, Guangyuan Liu, Dusit Niyato

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Via the ablation experiment in selected examples, we demonstrate the outstanding performances of the proposed techniques. Additionally, we compare our model with other probabilistic models, including Natural Gradient Boosting (NGboost), Classification and Regression Diffusion Models (CARD), Diffusion Boosted Trees (DBT), and non-probabilistic gradient boosting models. The experimental results show that our model outperforms existing models in both accuracy and inference time. Experiments on regression and classification tasks show that SSM can achieve better performance than existing methods and significantly shorten the inference time.
Researcher Affiliation Academia 1College of Computing and Data Science, Nanyang Technological University 2CNRS@CREATE, 1 Create Way, # 08-01 Create Tower, Singapore 138602 3Department of Electrical and Electronic Engineering, University of Hong Kong 4The Energy Research Institute @ NTU, Interdisciplinary Graduate Program EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Training Initialization: {σi}L i=1, training set D, loss coefficient λ(σi) 1: repeat 2: choose (x, y) D, σ {σi}L i=1, and ey N(y, σ) 3: Take gradient descent step on 4: θ 1 2λ(σi) sθ(ey, σ, x) + ey y σ2 2 2 5: until 6: converged Algorithm 2: Inference Initialization: {σi}L i=1, ε, T, {β(σi)}L 1 i=1 , x I 1: Initialize y0 2: for i 1 to L 1 do 3: αi ϵ σ2 i σ2 L 4: repeat 5: yt yt 1 + αisθ(yt 1, σi, x I) 6: until σ2 i sθ(yt, σi, x I) < β(σi) 7: y0 y T 8: end for 9: for t 1 to T do 10: yt yt 1 + ϵsθ(yt 1, σL, x I) 11: end for 12: return y T
Open Source Code No The paper does not contain any explicit statement or link indicating the release of its own source code for the methodology described.
Open Datasets Yes we first perform experiments on 5 selected toy examples (linear regression, quadratic regression, log-log linear regression, log-log cubic regression, and sinusoidal regression) proposed in (Han, Zheng, and Zhou 2022). We further evaluate our model on 10 UCI regression tasks (Dua and Graff 2017). For classification tasks, we compare our model with CARD on CIFAR-10 and CIFAR-100, focusing on both accuracy and the inference time (Krizhevsky 2009).
Dataset Splits No The paper states: "We employ the same experimental settings as those used in the CARD model (Han, Zheng, and Zhou 2022)" for UCI regression tasks. For toy examples and CIFAR datasets, it does not explicitly provide the specific percentages or counts for training, validation, and test splits within the paper's text.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific software names with version numbers (e.g., Python 3.8, PyTorch 1.9) needed to replicate the experiment.
Experiment Setup No The paper states: "As discussed in (Song and Ermon 2020), we need to design many parameters to ensure the effectiveness of training and inference, including (i) the choices of noise scales {σi}L i=1; (ii) the step size ϵ in Langevin dynamics; (iii) the inference steps t in Langevin equation." It also mentions "We employ the same experimental settings as those used in the CARD model (Han, Zheng, and Zhou 2022)" for UCI tasks. However, it does not provide concrete hyperparameter values (e.g., specific learning rates, batch sizes, epochs, or the numerical values for its own {σi}L i=1, ϵ, and T) used in its own experiments within the main text, instead referring to external work or describing parameter selection techniques rather than the actual values used.