reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fully Test-time Adaptation for Tabular Data

Authors: Zhi Zhou, Kun-Yang Yu, Lan-Zhe Guo, Yu-Feng Li

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct comprehensive experiments on six benchmark datasets, which are evaluated using three metrics. The experimental results demonstrate that FTAT outperforms state-of-the-art methods by a margin.
Researcher Affiliation	Academia	1National Key Laboratory for Novel Software Technology, Nanjing University, China 2School of Artificial Intelligence, Nanjing University, China 3School of Intelligence Science and Technology, Nanjing University, China. EMAIL
Pseudocode	No	The paper describes the methodology using textual explanations and mathematical formulations, but it does not include any explicitly labeled pseudocode blocks or algorithms.
Open Source Code	Yes	Project Homepage https://wnjxyk.github.io/FTTA
Open Datasets	Yes	We conduct comprehensive experiments on six benchmark datasets, which are evaluated using three metrics. ... We select six common tabular benchmark datasets from the Table Shift benchmark, which exhibit significant performance gaps under distribution shifts. ... Gardner, Popovic, and Schmidt 2023. Benchmarking Distribution Shift in Tabular Data with Table Shift. In Advances in Neural Information Processing Systems.
Dataset Splits	Yes	In our experiments on tabular tasks, we follow the fully test-time adaptation setting, where the source model is trained on training data and adapted to shifted test data without any access to the source training data. Specifically, we train the source model on training data and select the best model based on the validation set following the Table Shift benchmark (Gardner, Popovic, and Schmidt 2023). Then, FTAT approach and existing FTTA methods are evaluated on the shifted test set.
Hardware Specification	No	The paper does not explicitly describe any specific hardware components such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies or libraries used in the experimental setup.
Experiment Setup	Yes	As shown in Fig. 2, the optimal learning rates for different backbone models on the same task and same backbone model on different tasks varies. ... Here, we compare with four base models with different learning rates {1e 3, 1e 4, 5e 4, 1e 5}. ... In the main experiments, the batch size of the data stream is set to 512 ... batch sizes set to {64, 128, 256, 512, 1024}. ... FTAT contains three hyperparameters, i.e., ϵ, α and β. ... with α in {0.08, 0.09, 0.10, 0.11, 0.15, 0.20}, ϵ = Entropy([p, 1 p]) where p was set to {0.72, 0.71, 0.70, 0.69, 0.65, 0.60}, and β in {0.28, 0.29, 0.30, 0.31, 0.40, 0.50}.