reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Small Models are LLM Knowledge Triggers for Medical Tabular Prediction

Authors: Jiahuan Yan, Jintai Chen, Chaowen Hu, Bo Zheng, Yaojun Hu, Jimeng Sun, Jian Wu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments on widely used medical domain tabular datasets show that, without access to gold labels, applying SERSAL to Open AI GPT reasoning process attains substantial improvement compared to linguistic prompting methods, which serves as an orthogonal direction for tabular LLM, and increasing prompting bonus is observed as more powerful LLMs appear. Codes are available at https://github.com/jyansir/sersal.
Researcher Affiliation	Academia	1College of Computer Science and Technology, Zhejiang University 2Thrust of Artificial Intelligence, Information Hub, HKUST (GZ) 3Computer Science Department, University of Illinois Urbana-Champaign 4The Second Affiliated Hospital Zhejiang University School of Medicine 5Zhejiang Key Laboratory of Medical Imaging Artificial Intelligence EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Unsupervised SERSAL. Line 2: LLM pseudo labeling (Sec. 2.1); Line 3-5: Small model teaching (Sec. 2.2); Line 6: Quality control (Sec. 2.3); Line 7-9: Reverse tuning (Sec. 2.4).
Open Source Code	Yes	Codes are available at https://github.com/jyansir/sersal.
Open Datasets	Yes	We evaluate on ten widely recognized medical diagnosis tabular datasets on various diseases: Heart Failure Prediction (HF, Detrano et al. (1989)), Lung Cancer Prediction (LC, Ahmad & Mayya (2020)), Early Classification of Diabetes (ECD, Islam et al. (2020)), Indian Liver Patient Records (LI, Ramana et al. (2012)), Hepatitis C Prediction (HE, Hoffmann et al. (2018)), Pima Indians Diabetes Database (PID, Smith et al. (1988)), Framingham Heart Study (FH, O Donnell & Elosua (2008)), Stroke Prediction (ST, Fedesoriano (2020)), COVID-19 Presence(CO, Hemanthhari (2020)) and Anemia Disease (AN, Kilicarslan et al. (2021)).
Dataset Splits	Yes	We split each tabular dataset (80 % for training and 20 % for testing), and keep the same label distribution in each split.
Hardware Specification	Yes	All experiments are conducted with Py Torch on Python 3.8 and run on NVIDIA RTX 3090.
Software Dependencies	Yes	All experiments are conducted with Py Torch on Python 3.8 and run on NVIDIA RTX 3090.
Experiment Setup	Yes	For the small model, we uniformly use FT-Transformer with the default model and training configurations provided in the original paper (Gorishniy et al., 2021). For SERSAL, the only adjustable hyper-parameter is the temperature of Divide Mix (Li et al., 2019) with choices of 0.5, 5.0 and 10.0 in line 5 of Algorithm 1, which is selected by the metric of the early stopping set (D(t) es in line 4 of Algorithm 1). ... Additionally, we uniformly introduce the early stopping patience m to 5. The best temperature is selected based on the training loss of early stopping subset Des.