Decision Tree Induction Through LLMs via Semantically-Aware Evolution
Authors: Tennison Liu, Nicolas Huynh, Mihaela van der Schaar
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we demonstrate across various benchmarks that LLEGO evolves superior-performing trees compared to existing tree induction methods, and exhibits significantly more efficient search performance compared to conventional GP approaches. Empirically, on a wide range of classification and regression tabular benchmarks, we demonstrate that LLEGO significantly improves search efficiency and consistently evolves trees with superior generalization performance. |
| Researcher Affiliation | Academia | Tennison Liu , Nicolas Huynh & Mihaela van der Schaar DAMTP, University of Cambridge Cambridge, UK EMAIL |
| Pseudocode | No | The paper includes a 'LLEGO Overview' diagram (Figure 1) which illustrates the algorithm's flow, but it is a visual representation of steps rather than a formal pseudocode block with structured variables, loops, and conditional statements. The 'END-TO-END ALGORITHM' in Section 3.4 describes the process in prose. |
| Open Source Code | Yes | We provide the code to reproduce our results at https://github. com/nicolashuynh/LLEGO, and https://github.com/tennisonliu/LLEGO.1 Also available at the wider lab repository https://github.com/vanderschaarlab/LLEGO. |
| Open Datasets | Yes | We empirically evaluate LLEGO s ability to find performant decision trees for 12 open-source tabular datasets from Open ML curated benchmarks (Vanschoren et al., 2014), including 7 classification and 5 regression datasets. These datasets were selected based on the number of features, samples and the presence of semantically meaningful feature names and descriptions. We provide further details on this selection of datasets and preprocessing in Appendix C.1. |
| Dataset Splits | Yes | We preprocess the dataset using a train-validation-test split ratio of [0.2, 0.4, 0.4]. The low training split is used to accentuate the difference in performance as given sufficient training data, all methods perform comparably. For each run, we only vary the seed used for data splitting, such that for seed 0, we use train_test_split(seed=0). |
| Hardware Specification | Yes | We run all experiments on an AMD EPYC 7V13 64-Core Processor. |
| Software Dependencies | Yes | For our experiments, we use gpt-35-turbo, version 0301 with default hyperparameters temperature = 0.7 and top_p = 0.95. |
| Experiment Setup | Yes | For our instantiation of LLEGO in Section 5, we use N = 25 and G = 25. We seed the algorithm with a population of trees generated by CART, where each tree is fitted on 25% of the Dtrain. We use the same population to initialize GATree. In each iteration, we generate 25 crossover offspring and 25 mutation offspring... We use the elitism selection to preserve the top 25 trees... To compute the desired fitness, we use α = 0.1... We use τ = 10 for diversity guidance. For each genetic operation, we use λ = 4 parent trees. For our experiments, we use gpt-35-turbo, version 0301 with default hyperparameters temperature = 0.7 and top_p = 0.95. Hyperparameter tuning. We use Optuna (Akiba et al., 2019) and the default Tree-Parzen Estimator for hyperparameter tuning (HPT) (Watanabe, 2023). For all baselines, we permit wall-clock time to a maximum of 10 minutes. This allows 50 iterations of HPT for CART and C4.5, and 10 iterations for the computationally more intensive DL8.5, GOSDT, and GATree. In each iteration of HPT, we evaluate the objective on the validation set, selecting the best configuration to evaluate on the test set. Table 5: Hyperparameter search ranges. |