reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Language Models Are Good Tabular Learners

Authors: Zhenhan Huang, Kavitha Srinivas, Horst Samulowitz, Niharika S. D'Souza, Charu C. Aggarwal, Pin-Yu Chen, Jianxi Gao

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We examine the performance of TDTransformer on the standard tabular data benchmark Open ML 1. Extensive experiments on more than 70 tabular data sets show the superiority of TDTransformer. In summary, the main contributions of this work are as follows: (...) We also test the performance using Ro BERTa (Liu, 2019) as the backbone model (see Appendix). [CLS] embedding is used for the prediction. The training pipeline, similar to the classic pre-training fine-tuning paradigm, consists of two steps: the first step is to pre-train the model. The second step is to fine-tune the model that is initialized with pre-trained weights.
Researcher Affiliation	Collaboration	Zhenhan Huang EMAIL Department of Computer Science Rensselaer Polytechnic Institute Kavitha Srinivas EMAIL IBM Research Horst Samulowitz EMAIL IBM Research Niharika S. D Souza EMAIL IBM Research Charu C. Aggarwal EMAIL IBM Research Pin-Yu Chen EMAIL IBM Research Jianxi Gao EMAIL Department of Computer Science Rensselaer Polytechnic Institute
Pseudocode	No	The paper includes a figure (Figure 1) illustrating the TDTransformer framework pipeline, but it does not contain any structured pseudocode or algorithm blocks describing the methodology step-by-step in a code-like format.
Open Source Code	Yes	We release our code in https://github.com/Zhenhan-Huang/TDTransformer.
Open Datasets	Yes	We examine the performance of TDTransformer on the standard tabular data benchmark Open ML 1. Extensive experiments on more than 70 tabular data sets show the superiority of TDTransformer. (...) We use 76 real-world tabular classification datasets in the standard Open ML benchmark (which are manually curated for effective benchmarking). The details of the tables are given in Appendix Section A.4. Open ML benchmark: https://www.openml.org/
Dataset Splits	Yes	We use 76 real-world tabular classification datasets in the standard Open ML benchmark (which are manually curated for effective benchmarking). The train/validation/test splits is 72%/8%/20% for each Open ML dataset. We use accuracy as the metric to measure the performance for all classification data sets.
Hardware Specification	Yes	We conducted all epxeriments using a single A40 Tensor Core GPU and EPYC 7232P CPU.
Software Dependencies	No	TDTransformer uses pre-trained BERT tokenizer (Devlin, 2018) and Adam optimizer (Kingma, 2014) without weight decay. The hidden dimension is 512 and model depth is 12. The number of quantiles for PLE is 64. In both the pre-training and fine-tuning process, we use an early stopping strategy (Yao et al., 2007) with a patience of 10. The maximum number of training epochs is 200 with batch size of 128. The corruption parameter of pre-training process is set to 0.5. When there are empty cells in a column, we replace empty cells with the most common values in that column. While software components like BERT tokenizer and Adam optimizer are mentioned, specific version numbers for these or other crucial libraries (e.g., PyTorch, TensorFlow) are not provided.
Experiment Setup	Yes	The hidden dimension is 512 and model depth is 12. The number of quantiles for PLE is 64. In both the pre-training and fine-tuning process, we use an early stopping strategy (Yao et al., 2007) with a patience of 10. The maximum number of training epochs is 200 with batch size of 128. The corruption parameter of pre-training process is set to 0.5.