Small Models are LLM Knowledge Triggers for Medical Tabular Prediction

Authors: Jiahuan Yan, Jintai Chen, Chaowen Hu, Bo Zheng, Yaojun Hu, Jimeng Sun, Jian Wu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments on widely used medical domain tabular datasets show that, without access to gold labels, applying SERSAL to Open AI GPT reasoning process attains substantial improvement compared to linguistic prompting methods, which serves as an orthogonal direction for tabular LLM, and increasing prompting bonus is observed as more powerful LLMs appear. Codes are available at https://github.com/jyansir/sersal.
Researcher Affiliation Academia 1College of Computer Science and Technology, Zhejiang University 2Thrust of Artificial Intelligence, Information Hub, HKUST (GZ) 3Computer Science Department, University of Illinois Urbana-Champaign 4The Second Affiliated Hospital Zhejiang University School of Medicine 5Zhejiang Key Laboratory of Medical Imaging Artificial Intelligence EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Unsupervised SERSAL. Line 2: LLM pseudo labeling (Sec. 2.1); Line 3-5: Small model teaching (Sec. 2.2); Line 6: Quality control (Sec. 2.3); Line 7-9: Reverse tuning (Sec. 2.4).
Open Source Code Yes Codes are available at https://github.com/jyansir/sersal.
Open Datasets Yes We evaluate on ten widely recognized medical diagnosis tabular datasets on various diseases: Heart Failure Prediction (HF, Detrano et al. (1989)), Lung Cancer Prediction (LC, Ahmad & Mayya (2020)), Early Classification of Diabetes (ECD, Islam et al. (2020)), Indian Liver Patient Records (LI, Ramana et al. (2012)), Hepatitis C Prediction (HE, Hoffmann et al. (2018)), Pima Indians Diabetes Database (PID, Smith et al. (1988)), Framingham Heart Study (FH, O Donnell & Elosua (2008)), Stroke Prediction (ST, Fedesoriano (2020)), COVID-19 Presence(CO, Hemanthhari (2020)) and Anemia Disease (AN, Kilicarslan et al. (2021)).
Dataset Splits Yes We split each tabular dataset (80 % for training and 20 % for testing), and keep the same label distribution in each split.
Hardware Specification Yes All experiments are conducted with Py Torch on Python 3.8 and run on NVIDIA RTX 3090.
Software Dependencies Yes All experiments are conducted with Py Torch on Python 3.8 and run on NVIDIA RTX 3090.
Experiment Setup Yes For the small model, we uniformly use FT-Transformer with the default model and training configurations provided in the original paper (Gorishniy et al., 2021). For SERSAL, the only adjustable hyper-parameter is the temperature of Divide Mix (Li et al., 2019) with choices of 0.5, 5.0 and 10.0 in line 5 of Algorithm 1, which is selected by the metric of the early stopping set (D(t) es in line 4 of Algorithm 1). ... Additionally, we uniformly introduce the early stopping patience m to 5. The best temperature is selected based on the training loss of early stopping subset Des.