On the Power of Learning-Augmented Search Trees

Authors: Jingbang Chen, Xinyuan Cao, Alicia Stepin, Li Chen

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We complement our analysis with an empirical study, demonstrating that our method outperforms prior work and classic data structures. 5. Experiments In this section, we give experimental results that compare our learning-augmented Treap (learn-bst) with learning-augmented BST (Lin et al., 2022) (learn-llw), learning-augmented skip-list (Fu et al., 2025) (learn-skiplist) and classical search tree data structures including Red-Black Trees (red-black), AVL Trees (avl), Splay Trees (splay), BTrees of order 3 (b-tree), and randomized Treaps (random-treap).
Researcher Affiliation Academia 1University of Waterloo 2Georgia Institute of Technology 3Carnegie Mellon University 4Independent; Part of the work by Jingbang Chen and Li Chen was done while at Georgia Tech. Correspondence to: Li Chen <EMAIL>.
Pseudocode No The paper describes algorithms and data structures through mathematical definitions and theorems, but does not include any explicitly labeled pseudocode or algorithm blocks with structured, code-like formatting.
Open Source Code No The paper does not provide an explicit statement about open-sourcing the code for the described methodology, nor does it include a link to a code repository.
Open Datasets No We consider a synthetic data setting, with n unique items appearing in a sequence of length m. We define the frequency... The data follows one of three distributions: the Zipfian distribution, the distribution described in Theorem 2.14 (adversarial distribution), and the uniform distribution.
Dataset Splits No The paper evaluates performance on synthetic data generated according to different distributions. It specifies parameters like 'n = 1000 and vary m over [2000, 6000, 10000, 16000, 20000]' for the synthetic data generation, but does not define any training, validation, or test splits for machine learning experiments.
Hardware Specification No The paper describes experimental results and comparisons but does not provide any specific details regarding the hardware used (e.g., CPU, GPU models, memory, or cloud infrastructure).
Software Dependencies No Specifically, we initialize a uniform distribution and optimize it toward a target distribution with a specified KL divergence level using the Sequential Least Squares Programming (SLSQP) optimizer in Sci Py (Virtanen et al., 2020).
Experiment Setup Yes Experiments are conducted in a similar manner in Lin et al. (2022): (1) All keys are inserted in a random order to avoid insertion order sensitivity. (2) The total access cost is measured by the total number of comparisons needed while accessing keys. We set n = 1000 and vary m over [2000, 6000, 10000, 16000, 20000]. The x-axis represents the number of unique items, and the y-axis denotes the number of comparisons made, which measures access cost. Specifically, we initialize a uniform distribution and optimize it toward a target distribution with a specified KL divergence level using the Sequential Least Squares Programming (SLSQP) optimizer in Sci Py (Virtanen et al., 2020).