Unsupervised Tree Boosting for Learning Probability Distributions
Authors: Naoki Awaya, Li Ma
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct numerical experiments to demonstrate and evaluate our method. We start with a set of simulations under several representative forms of density functions in a 48dimensional sample space, through which we examine the impact of the tuning parameters namely, the learning rate which controls global shrinkage, the scale-specific shrinkage parameter which tunes the shrinkage on tree nodes of different sizes, and the number of trees as well as that of the two-stage strategy on the performance of the algorithm. In addition, we compare the predictive performance of our method with a state-of-the-art single-tree learner as well as the closely related tree density destructor (Inouye and Ravikumar, 2018). Throughout, the performance of various methods is evaluated by the predictive score, i.e., the average fitted log-density on a testing set. We then provide a comparative study based on several popular benchmark data sets that pitches our method against several state-of-the-art NF methods including both deep-learning based NFs and the tree density destructor. |
| Researcher Affiliation | Academia | Naoki Awaya EMAIL School of Political Science and Economics Waseda University Shinjuku City, Tokyo 169-8050, Japan; Li Ma EMAIL Department of Statistical Science Duke University Durham, NC 27708, USA |
| Pseudocode | Yes | The algorithm is summarized below. Initialization Set r(0) = (x1, . . . , xn). Forward-stagewise fitting Repeat the following steps for k = 1, . . . , K: 1. Fit a weak learner that produces a pair of outputs (Gk, Tk) to the residualized observations r(k 1), where Tk is an inferred partition tree and Gk is the tree-CDF for a measure Gk PTk. 2. Update the residuals r(k) = (r(k) 1 , . . . , r(k) n ), where r(k) i = Gk(r(k 1) i ). |
| Open Source Code | Yes | An R package for our method is available at https://github.com/Ma Stat Lab/boost PM and the code for the numerical examples is at https://github.com/Ma Stat Lab/boost PM_ experiments. |
| Open Datasets | Yes | We evaluate the performance of our boosting algorithm using seven popular benchmark data sets recorded in the University of California, Irvine (UCI) machine learning repository (Dua and Graff, 2017). We preprocessed the four data sets ( POWER , GAS , HEPMASS , and MINIBOONE ) following Papamakarios et al. (2017) with the code provided at https: //github.com/gpapamak/maf and the three data sets ( ARe M , CASP , and BANK ) with the code provided at https://zenodo.org/record/4560982#.Yh4k_Oi ZOCo. ... We next compute the variable importance on the MNIST handwritten digits data (Le Cun et al., 1998). |
| Dataset Splits | Yes | We set the sample size n of both training and testing data sets to 10,000. ... In each iteration of the algorithm, we use a portion of the data, for example, 90% of the current residuals, to fit the next weak learner, and use the rest of the data to evaluate this tree measure by computing the average log densities. |
| Hardware Specification | Yes | The evaluation of the computational cost is done in a single-core AMD EPYC 7002 (2.50GHz) environment. |
| Software Dependencies | No | The paper mentions the use of an "R package dslabs (Irizarry and Gill, 2021)" and external code repositories for comparisons, but it does not specify explicit version numbers for these or other critical software dependencies like programming languages or deep learning frameworks. |
| Experiment Setup | Yes | In the first experiment, we set the value of c0, the global shrinkage parameter, to 0.01, 0.1 and 0.99 while setting γ, the scale-specific shrinkage parameter, to 0. ... Unless otherwise noted, we adopt the two-stage strategy and the adaptive stopping discussed in Section 2.7.4 and in Section 2.7.5, respectively, and set the maximum number of trees for the first stage (estimation of the marginal distributions) to 100 per dimension and for the second stage (estimation of the dependence structures) to 5,000 unless otherwise stated. |