reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Deep Tabular Learning via Distillation and Language Guidance

Authors: Ruohan Wang, Wenhao Fu, Carlo Ciliberto

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, Dis Tab outperforms existing tabular DL models and is highly competitive against tree-based models across diverse datasets, effectively closing the gap with these methods. [...] Empirically, we conduct a extensive comparison of Dis Tab against existing tabular learning approaches across diverse tabular datasets. Our results demonstrate that Dis Tab not only outperforms existing tabular DL methods but also achieves competitive performance against GBDT models. Furthermore, we conduct comprehensive ablation studies on Dis Tab, where we systematically analyze the contributions of each of its components.
Researcher Affiliation	Academia	Ruohan Wang EMAIL Institute for Infocomm Research (I2R), ASTAR, Singapore Wenhao Fu EMAIL Institute for Infocomm Research (I2R), ASTAR, Singapore Carlo Ciliberto EMAIL AI Center, University College London
Pseudocode	No	The paper describes the model architecture, embedding functions, pre-training process, and overall algorithm in dedicated sections (3.1, 3.2, 3.3) with mathematical formulas and descriptive text. However, it does not include any clearly labeled pseudocode or algorithm blocks with structured steps in a code-like format.
Open Source Code	Yes	Our code is available at https://github.com/Ruohan W/Dis Tab
Open Datasets	Yes	We use 25 datasets from Open ML for all evaluations (see Appendix A for details). We follow the datasets used in Zhu et al. (2023), but focus on those with meaningful textual column headers, since they allow us to apply and evaluate the proposed language-guided embeddings. For each Open ML dataset, we use the default train/test splits defined by the Open ML library to ensure better reproducibility (10% data is reserved for testing for each split).
Dataset Splits	Yes	For each Open ML dataset, we use the default train/test splits defined by the Open ML library to ensure better reproducibility (10% data is reserved for testing for each split). For each training split, we randomly partition 90% of data for training and the rest for validation. All methods are trained and evaluated using the same splits.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	Yes	We use the recommended hyperparameters (see Tab. 5, Tab. 6,Tab. 7, Tab. 8), early stopping strategy, and feature pre-processing implemented in Auto Gluon (Erickson et al., 2020) 1.0.0 release for each tree-based model, which achieves strong performance on the evaluated datasets.
Experiment Setup	Yes	For Dis Tab, we use a batch size of 1024 for pre-training and 128 during fine-tuning. For the existing DL tabular methods, we use the batch size 128 for both pre-training and fine-tuning, as recommended in Bahri et al. (2022); Zhu et al. (2023). All DL-based methods use Adam optimizer with a learning rate of 1e-4, with a weight decay of 1e-5, following Gorishniy et al. (2021); Rubachev et al. (2022). Number of pre-training and fine-tuning epochs are empirically determined for each method, but remain consistent across different tasks. For Dis Tab, we use 30 epochs for pre-training and 20 for fine-tuning. [...] We performed grid search for all methods to determine the key hyper-parameter values that have been reported in the previous section. For each task, we use the validation performance on the first train/test split, as specified by Open ML, to guide the grid search. Validation performance across different tasks is averaged to select the best performing hyper-parameter configuration for each model. We use a single set of hyper-parameters for different tasks...