In-Context Adaptation to Concept Drift for Learned Database Operations

Authors: Jiaqi Zhu, Shaofeng Cai, Yanyan Shen, Gang Chen, Fang Deng, Beng Chin Ooi

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments across key database tasks demonstrate that FLAIR outperforms state-of-the-art baselines, achieving up to 5.2ˆ faster adaptation and reducing error by 22.5% for cardinality estimation.
Researcher Affiliation Academia 1Beijing Institute of Technology, Beijing, China 2National University of Singapore, Singapore 3Shanghai Jiao Tong University, Shanghai, China 4Zhejiang University, Hangzhou, China.
Pseudocode Yes Algorithm 1 FLAIR Training; Algorithm 2 Concurrent FLAIR Inference and Adaptation
Open Source Code Yes The code and data of FLAIR are available at https:// anonymous.4open.science/r/FLAIR-D4DA/
Open Datasets Yes We evaluate FLAIR on two established real-world benchmarks: STATS (STA, 2015) and JOBlight (Leis et al., 2018; 2015). STATS contains over 1 million records, while JOB-light, derived from the IMDB dataset, includes 62 million records.
Dataset Splits Yes our evaluation involves randomly generating 2000 diverse queries with sub-queries to form the training set for each benchmark. In the STATS benchmark, we utilize an existing workload of 146 queries with 2603 sub-queries as the test set. For JOB-light, the test set comprises 70 queries associated with 696 sub-queries. ... We allocate 50% of the original data as the training set, and following prior setups, induced data drift on the remaining data. We designate 20% of the post-drift data as the update set and the remaining post-drift data as the test set.
Hardware Specification Yes All the experiments are conducted on a server with a Xeon(R) Silver 4214R CPU @ 2.40GHz (12 cores), 128G memory, and a Ge Force RTX 3090 with CUDA 11.8.
Software Dependencies Yes FLAIR is implemented in Python with Pytorch 2.0.1. ... The experiments involving Postgre SQL are conducted on Postgre SQL 13.1. ... CUDA 11.8.
Experiment Setup Yes For FLAIR, the queue size ϱ is set to 80, unless specified otherwise. ... We conduct a sensitivity analysis by varying the number of bins δ used in data encoding. Specifically, we test δ P t10, 20, 40, 60, 80u across four scenarios...