reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Representation Space Augmentation for Effective Self-Supervised Learning on Tabular Data

Authors: Moonjung Eo, Kyungeun Lee, Hye-Seung Cho, Dongmin Kim, Ye Seul Sim, Woohyung Lim

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present extensive experimental results to demonstrate the efficacy of Ra Tab. We compare Ra Tab with a wide range of So TA models, including various gradient-boosted decision trees (GBDTs) and DNNs. All experiments were performed on a single NVIDIA Ge Force RTX 3090. We evaluate Ra Tab on 13 diverse datasets from Open ML (Vanschoren et al. 2014)1, including binary classification, multiclass classification, and regression tasks. In this section, we conduct a series of ablation studies and in-depth analyses to uncover the key characteristics of Ra Tab and elucidate the reasons behind its impressive performance.
Researcher Affiliation	Industry	LG AI Research, Seoul, Republic of Korea EMAIL
Pseudocode	Yes	Algorithm 1: Self-Supervised learning with Ra Tab Input: Dataset D Parameter: Epochs E, predefined rank k, structure of fen, gde, hprj, loss weight λ Output: Pre-trained encoder fen 1: Initialize model parameters of fen, gde and hprj 2: for epoch = 1 to E do 3: for each batch X D do 4: z fen(X) 5: W Extract Weights(fen s last layer) 6: U, S, V SVD(W) 7: Wk Uk diag(Select Top K(S, k)) V T k 8: Update fen s last layer weights to Wk 9: zaug ˆfen(X) 10: ˆ X gde(z) 11: Lrecon MSE(X, ˆ X) 12: p, paug hprj(z), hprj(zaug) 13: LInfo NCE Contrastive Loss(p, paug) 14: Ltotal λLrecon + (1 λ)LInfo NCE 15: Update parameters of fen, gde, and hprj by minimizing Ltotal 16: end for 17: end for
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for Ra Tab, nor does it provide a link to a code repository.
Open Datasets	Yes	We evaluate Ra Tab on 13 diverse datasets from Open ML (Vanschoren et al. 2014)1, including binary classification, multiclass classification, and regression tasks.
Dataset Splits	No	To optimize the depth and width of fen for MLPs, we determine the best configuration based on validation performance in a supervised setup training only the encoder with a linear head under a supervised loss, which preserves the unsupervised nature of our framework. We report the average results with 10 times finetuning with different random seeds. The paper does not provide specific percentages or methodologies for how the datasets were split into training, validation, and test sets for reproducibility.
Hardware Specification	Yes	All experiments were performed on a single NVIDIA Ge Force RTX 3090.
Software Dependencies	No	The paper mentions various models and techniques but does not specify the versions of any software libraries, frameworks, or programming languages used in their implementation.
Experiment Setup	Yes	Hyper-parameters For each network and dataset, we determine the rank k by selecting the p percentage of the full rank. The value of p is chosen from a set of predefined percentages: {50%, 60%, 70%}. The encoder s dropout rate was fixed at 0.1 for all experiments. We used a learning rate of 0.001 for finetuning across all experiments. In our experiments, we set λ = 0.5 to give equal priority to both objectives.