TabFlex: Scaling Tabular Learning to Millions with Linear Attention

Authors: Yuchen Zeng, Tuan Dinh, Wonjun Kang, Andreas C Mueller

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive evaluations demonstrate that TABFLEX can achieve over a 2 speedup compared to TABPFN and a 1.5 speedup over XGBoost, outperforming 25 tested baselines in terms of efficiency across a diverse range of datasets.
Researcher Affiliation Collaboration 1Work done during an internship at the Gray Systems Lab, Microsoft 2University of Wisconsin-Madison 3University of California San Francisco 4Furiosa AI 5Seoul National University 6Gray System Lab, Microsoft. Correspondence to: Andreas C. Müeller <EMAIL>.
Pseudocode Yes Algorithm 1 Conditional Model Selection Input :A dataset D with n instances of d features
Open Source Code Yes Our code is available at https: //github.com/microsoft/ticl.
Open Datasets Yes We evaluate TABFLEX s performance and speed across 115 Open ML tabular datasets (Vanschoren et al., 2013).
Dataset Splits Yes For each dataset, we consider ten different train/test splits, computing the score mean and standard deviation, as well as the total runtime per 1000 instances.
Hardware Specification Yes Each model is trained on a single Nvidia A100 80GB PCIe GPU.
Software Dependencies No In our implementation, we adopt a straightforward Py Torch approach to linear attention rather than an HBM-efficient method. We employ the concise two-line implementation presented in Listing 1. In the following lemma, we demonstrate that this straightforward implementation only incurs a marginal increase in HBM accesses and HBM memory usage.
Experiment Setup Yes Table 6 summarizes the hyperparameters selected for training TABFLEX-S100, TABFLEX-L100, and TABFLEX-H1K. For all three methods, we utilize the same embedding size of 512, consistent with TABPFN.