T-JEPA: Augmentation-Free Self-Supervised Learning for Tabular Data
Authors: Hugo Thimonier, José Lucas De Melo Costa, Fabrice Popineau, Arpad Rimmel, Bich-Liên DOAN
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results demonstrate a substantial improvement in both classification and regression tasks, outperforming models trained directly on samples in their original data space. 4 EXPERIMENTS 4.1 EXPERIMENTAL SETTING Datasets Following previous work (Ye et al., 2024), we experiments on 7 datasets with heterogeneous features to test the effectiveness of T-JEPA. We test our approach on several supervised tabular deep learning tasks such as binary and multi-class classification, as well as regression. We use as performance metrics Accuracy ( ) and RMSE ( ) for classification and regression respectively. Table 1: Performance metrics for different downstream models trained on the original dataspace and the generated T-JEPA representation across datasets. |
| Researcher Affiliation | Collaboration | 1 Université Paris-Saclay, CNRS, Centrale Supélec, Laboratoire Interdisciplinaire des Sciences du Numérique, 91190, Gif-sur-Yvette, France. 2 Emobot, France. {name}.{surname}@centralesupelec.fr |
| Pseudocode | No | The paper describes the T-JEPA training pipeline (Figure 1) and the formal equations for its components (equations 1-5). It outlines the steps of the method in Section 3, but it does not present a structured, explicitly labeled pseudocode block or algorithm. |
| Open Source Code | Yes | Each experiments detailed in the present work can be reproduced using the following code: : https://github.com/jose-melo/t-jepa |
| Open Datasets | Yes | The datasets we include in our experiments are Adult (AD) (Kohavi et al., 1996), Higgs (HI) (Vanschoren et al., 2014), Helena (HE) (Guyon et al., 2019), Jannis (JA) (Guyon et al., 2019), ALOI (AL) (Geusebroek et al., 2005) and California housing (CA) (Pace and Barry, 1997). We also add MNIST (interpreted as a tabular data) to our benchmark following Yoon et al. (2020). |
| Dataset Splits | Yes | T-JEPA Training We split each dataset into training/validation/test sets (80/10/10) which were used for selecting both the hyperparameters of T-JEPA and of the models used for the downstream task. |
| Hardware Specification | Yes | The training was done on a single NVIDIA HGX A100 GPU with 40GB of memory. |
| Software Dependencies | Yes | Table 4: Main libraries used in the project. Library Description Python v3.12.2 The programming language used for the project einops v0.8.0 A flexible and powerful tool for tensor operations matplotlib v3.8.4 A library for creating static, animated, and interactive plots numpy v2.1.0 Fundamental package for scientific computing with arrays pandas v2.2.2 Data manipulation and analysis tool pytorch_lightning v2.2.1 A Py Torch wrapper for high-performance deep learning research scikit_learn v1.4.1.post1 Machine learning library for Python scipy v1.14.1 Library for scientific and technical computing torch v2.3.0.post301 Py Torch deep learning library torchinfo v1.8.0 Module to show model summaries in Py Torch tqdm v4.66.2 Progress bar utility for Python xgboost v2.1.1 Optimized gradient boosting library |
| Experiment Setup | Yes | We employed Bayesian optimization to tune the hyperparameters of T-JEPA. The batch size was fixed at 512 for all configurations, while the exponential moving average (EMA) decay rate was set to vary from 0.996 to 1. Additionally, we used four prediction masks throughout the training process. For optimization, we selected the Adam W optimizer (Loshchilov and Hutter, 2019) due to its proven robustness in large-scale models. The learning rate was adaptively adjusted using a cosine annealing scheduler (Loshchilov and Hutter, 2017), which gradually reduced it from the initial value to a minimum, ηmin = 0. Table 5: Hyperparameter Configuration for Bayesian Optimization |