reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Understanding the Limits of Deep Tabular Methods with Temporal Shift

Authors: Haorun Cai, Han-Jia Ye

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that this temporal embedding, combined with the improved training protocol, provides a more effective and robust framework for learning from temporal tabular data.
Researcher Affiliation	Academia	1School of Artificial Intelligence, Nanjing University, China 2National Key Laboratory for Novel Software Technology, Nanjing University, China. Correspondence to: Han-Jia Ye <EMAIL>.
Pseudocode	No	The paper describes methods verbally and mathematically, including equations for Fourier series expansion and temporal embedding, but does not present any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our source code is now available at https://github.com/LAMDA-Tabular/Tabular-Temporal-Shift.
Open Datasets	Yes	We adopt the full Tab Re D dataset (Rubachev et al., 2025) without modification. ... Table 3. Overview of Datasets. Task descriptions from Rubachev et al. (2025).
Dataset Splits	Yes	Tab Re D (Rubachev et al., 2025) adopts a temporal split where the data is divided at Tval, such that Dtrain = S t Tval Dt and Dval = S Tval<t Ttrain Dt. ... The random split is subject to variability from both the split selection and the running seeds during the training phase.
Hardware Specification	No	The paper does not explicitly describe any specific hardware used for running the experiments, such as GPU models, CPU models, or memory specifications.
Software Dependencies	No	We tune hyperparameters using Optuna (Akiba et al., 2019)... For all deep learning methods, we use a batch size of 1024 and Adam W (Loshchilov & Hutter, 2019) as the optimizer... While specific tools (Optuna, Adam W) are mentioned with citations, explicit version numbers for core software components like Python, PyTorch/TensorFlow, or other libraries are not provided.
Experiment Setup	Yes	We tune hyperparameters using Optuna (Akiba et al., 2019), performing 100 trials for most methods... For all deep learning methods, we use a batch size of 1024 and Adam W (Loshchilov & Hutter, 2019) as the optimizer, with an early stopping patience of 16.