reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

In-Context Adaptation to Concept Drift for Learned Database Operations

Authors: Jiaqi Zhu, Shaofeng Cai, Yanyan Shen, Gang Chen, Fang Deng, Beng Chin Ooi

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across key database tasks demonstrate that FLAIR outperforms state-of-the-art baselines, achieving up to 5.2ˆ faster adaptation and reducing error by 22.5% for cardinality estimation.
Researcher Affiliation	Academia	1Beijing Institute of Technology, Beijing, China 2National University of Singapore, Singapore 3Shanghai Jiao Tong University, Shanghai, China 4Zhejiang University, Hangzhou, China.
Pseudocode	Yes	Algorithm 1 FLAIR Training; Algorithm 2 Concurrent FLAIR Inference and Adaptation
Open Source Code	Yes	The code and data of FLAIR are available at https:// anonymous.4open.science/r/FLAIR-D4DA/
Open Datasets	Yes	We evaluate FLAIR on two established real-world benchmarks: STATS (STA, 2015) and JOBlight (Leis et al., 2018; 2015). STATS contains over 1 million records, while JOB-light, derived from the IMDB dataset, includes 62 million records.
Dataset Splits	Yes	our evaluation involves randomly generating 2000 diverse queries with sub-queries to form the training set for each benchmark. In the STATS benchmark, we utilize an existing workload of 146 queries with 2603 sub-queries as the test set. For JOB-light, the test set comprises 70 queries associated with 696 sub-queries. ... We allocate 50% of the original data as the training set, and following prior setups, induced data drift on the remaining data. We designate 20% of the post-drift data as the update set and the remaining post-drift data as the test set.
Hardware Specification	Yes	All the experiments are conducted on a server with a Xeon(R) Silver 4214R CPU @ 2.40GHz (12 cores), 128G memory, and a Ge Force RTX 3090 with CUDA 11.8.
Software Dependencies	Yes	FLAIR is implemented in Python with Pytorch 2.0.1. ... The experiments involving Postgre SQL are conducted on Postgre SQL 13.1. ... CUDA 11.8.
Experiment Setup	Yes	For FLAIR, the queue size ϱ is set to 80, unless specified otherwise. ... We conduct a sensitivity analysis by varying the number of bins δ used in data encoding. Specifically, we test δ P t10, 20, 40, 60, 80u across four scenarios...