In-Context Adaptation to Concept Drift for Learned Database Operations
Authors: Jiaqi Zhu, Shaofeng Cai, Yanyan Shen, Gang Chen, Fang Deng, Beng Chin Ooi
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments across key database tasks demonstrate that FLAIR outperforms state-of-the-art baselines, achieving up to 5.2ˆ faster adaptation and reducing error by 22.5% for cardinality estimation. |
| Researcher Affiliation | Academia | 1Beijing Institute of Technology, Beijing, China 2National University of Singapore, Singapore 3Shanghai Jiao Tong University, Shanghai, China 4Zhejiang University, Hangzhou, China. |
| Pseudocode | Yes | Algorithm 1 FLAIR Training; Algorithm 2 Concurrent FLAIR Inference and Adaptation |
| Open Source Code | Yes | The code and data of FLAIR are available at https:// anonymous.4open.science/r/FLAIR-D4DA/ |
| Open Datasets | Yes | We evaluate FLAIR on two established real-world benchmarks: STATS (STA, 2015) and JOBlight (Leis et al., 2018; 2015). STATS contains over 1 million records, while JOB-light, derived from the IMDB dataset, includes 62 million records. |
| Dataset Splits | Yes | our evaluation involves randomly generating 2000 diverse queries with sub-queries to form the training set for each benchmark. In the STATS benchmark, we utilize an existing workload of 146 queries with 2603 sub-queries as the test set. For JOB-light, the test set comprises 70 queries associated with 696 sub-queries. ... We allocate 50% of the original data as the training set, and following prior setups, induced data drift on the remaining data. We designate 20% of the post-drift data as the update set and the remaining post-drift data as the test set. |
| Hardware Specification | Yes | All the experiments are conducted on a server with a Xeon(R) Silver 4214R CPU @ 2.40GHz (12 cores), 128G memory, and a Ge Force RTX 3090 with CUDA 11.8. |
| Software Dependencies | Yes | FLAIR is implemented in Python with Pytorch 2.0.1. ... The experiments involving Postgre SQL are conducted on Postgre SQL 13.1. ... CUDA 11.8. |
| Experiment Setup | Yes | For FLAIR, the queue size ϱ is set to 80, unless specified otherwise. ... We conduct a sensitivity analysis by varying the number of bins δ used in data encoding. Specifically, we test δ P t10, 20, 40, 60, 80u across four scenarios... |