Representation Space Augmentation for Effective Self-Supervised Learning on Tabular Data
Authors: Moonjung Eo, Kyungeun Lee, Hye-Seung Cho, Dongmin Kim, Ye Seul Sim, Woohyung Lim
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present extensive experimental results to demonstrate the efficacy of Ra Tab. We compare Ra Tab with a wide range of So TA models, including various gradient-boosted decision trees (GBDTs) and DNNs. All experiments were performed on a single NVIDIA Ge Force RTX 3090. We evaluate Ra Tab on 13 diverse datasets from Open ML (Vanschoren et al. 2014)1, including binary classification, multiclass classification, and regression tasks. In this section, we conduct a series of ablation studies and in-depth analyses to uncover the key characteristics of Ra Tab and elucidate the reasons behind its impressive performance. |
| Researcher Affiliation | Industry | LG AI Research, Seoul, Republic of Korea EMAIL |
| Pseudocode | Yes | Algorithm 1: Self-Supervised learning with Ra Tab Input: Dataset D Parameter: Epochs E, predefined rank k, structure of fen, gde, hprj, loss weight λ Output: Pre-trained encoder fen 1: Initialize model parameters of fen, gde and hprj 2: for epoch = 1 to E do 3: for each batch X D do 4: z fen(X) 5: W Extract Weights(fen s last layer) 6: U, S, V SVD(W) 7: Wk Uk diag(Select Top K(S, k)) V T k 8: Update fen s last layer weights to Wk 9: zaug ˆfen(X) 10: ˆ X gde(z) 11: Lrecon MSE(X, ˆ X) 12: p, paug hprj(z), hprj(zaug) 13: LInfo NCE Contrastive Loss(p, paug) 14: Ltotal λLrecon + (1 λ)LInfo NCE 15: Update parameters of fen, gde, and hprj by minimizing Ltotal 16: end for 17: end for |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code for Ra Tab, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We evaluate Ra Tab on 13 diverse datasets from Open ML (Vanschoren et al. 2014)1, including binary classification, multiclass classification, and regression tasks. |
| Dataset Splits | No | To optimize the depth and width of fen for MLPs, we determine the best configuration based on validation performance in a supervised setup training only the encoder with a linear head under a supervised loss, which preserves the unsupervised nature of our framework. We report the average results with 10 times finetuning with different random seeds. The paper does not provide specific percentages or methodologies for how the datasets were split into training, validation, and test sets for reproducibility. |
| Hardware Specification | Yes | All experiments were performed on a single NVIDIA Ge Force RTX 3090. |
| Software Dependencies | No | The paper mentions various models and techniques but does not specify the versions of any software libraries, frameworks, or programming languages used in their implementation. |
| Experiment Setup | Yes | Hyper-parameters For each network and dataset, we determine the rank k by selecting the p percentage of the full rank. The value of p is chosen from a set of predefined percentages: {50%, 60%, 70%}. The encoder s dropout rate was fixed at 0.1 for all experiments. We used a learning rate of 0.001 for finetuning across all experiments. In our experiments, we set λ = 0.5 to give equal priority to both objectives. |