LLM-Guided Self-Supervised Tabular Learning With Task-Specific Pre-text Tasks
Authors: Sungwon Han, Seungeon Lee, Meeyoung Cha, Sercan O Arik, Jinsung Yoon
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | TST-LLM consistently outperforms contemporary baselines with win ratios of 95% and 81%, when applied to 22 benchmark tabular datasets, including binary and multi-class classification, and regression tasks. 4 Experiment We evaluate TST-LLM across multiple tabular datasets with various downstream tasks. Through our experiments, we discuss which components of the model contributed to performance enhancements and how our model operates. |
| Researcher Affiliation | Collaboration | Sungwon Han EMAIL Korea Advanced Institute of Science and Technology (KAIST) Seungeon Lee EMAIL Korea Advanced Institute of Science and Technology (KAIST) Meeyoung Cha EMAIL Max Planck Institute for Security and Privacy Sercan Ö. Arik EMAIL Google Cloud AI Jinsung Yoon EMAIL Google Cloud AI |
| Pseudocode | Yes | Algorithm 1: Algorithm for TST-LLM. Input : Original dataset D, Large language model backbone LLM, Encoder f, Original input feature set Y, Number of features to select M, Entropy threshold tent, Meta-information Etask, Ename, and Edesc. Output : Trained encoder f. ... Algorithm 2: Algorithm for feature selection with minimum redundancy. |
| Open Source Code | Yes | Our code is available on Github1. 1https://github.com/Sungwon-Han/TST-LLM |
| Open Datasets | Yes | Adult (Asuncion & Newman, 2007), Balance-scale (Siegler, 1994), Bank (Moro et al., 2014), Blood (Yeh et al., 2009), Car (Kadra et al., 2021), Communities (Redmond, 2009), Credit-g (Kadra et al., 2021), Diabetes (Smith et al., 1988), Eucalyptus (Bulloch et al., 1991), Forest-fires (Cortez & Morais, 2008), Heart (fedesoriano, 2021), Junglechess (van Rijn & Vis, 2014), Myocardial (Golovenkin et al., 2020), Tic-tac-toe (Aha, 1991), Vehicle (Mowforth & Shepherd), Bike (Fanaee-T, 2013), Crab (Sidhu, 2021), Housing (Pace & Barry, 1997), Insurance (Datta, 2020), Wine (Cortez & Reis, 2009), Sequence-type, and Solution-mix. Descriptive statistics and task descriptions for each dataset are available in the Appendix A.1 and B. |
| Dataset Splits | No | The paper lists multiple benchmark datasets, but does not explicitly state the train/test/validation split ratios or methodology used for these datasets in the main text or appendices. It mentions "Experiments were run with 3 different random seeds, and the average values were reported" but this does not specify data splits. |
| Hardware Specification | Yes | The comparison was conducted on the Adult dataset using a single A100 GPU. |
| Software Dependencies | No | The paper mentions using GPT-3.5 as the LLM backbone and Adam optimizer, but does not specify version numbers for any software libraries, frameworks, or programming languages used for implementation. |
| Experiment Setup | Yes | During LLM generation, the temperature was set to 0.5 and the top-p value was set to the API s default of 1. The discovery process generated five features per trial, with the number of trials set at 40. [...] The number of selected features M was set to 20. [...] Training utilized the Adam optimizer with a learning rate of 1e-4, a batch size of 128, and 1000 training iterations. |