T-JEPA: Augmentation-Free Self-Supervised Learning for Tabular Data

Authors: Hugo Thimonier, José Lucas De Melo Costa, Fabrice Popineau, Arpad Rimmel, Bich-Liên DOAN

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results demonstrate a substantial improvement in both classification and regression tasks, outperforming models trained directly on samples in their original data space. 4 EXPERIMENTS 4.1 EXPERIMENTAL SETTING Datasets Following previous work (Ye et al., 2024), we experiments on 7 datasets with heterogeneous features to test the effectiveness of T-JEPA. We test our approach on several supervised tabular deep learning tasks such as binary and multi-class classification, as well as regression. We use as performance metrics Accuracy ( ) and RMSE ( ) for classification and regression respectively. Table 1: Performance metrics for different downstream models trained on the original dataspace and the generated T-JEPA representation across datasets.
Researcher Affiliation Collaboration 1 Université Paris-Saclay, CNRS, Centrale Supélec, Laboratoire Interdisciplinaire des Sciences du Numérique, 91190, Gif-sur-Yvette, France. 2 Emobot, France. {name}.{surname}@centralesupelec.fr
Pseudocode No The paper describes the T-JEPA training pipeline (Figure 1) and the formal equations for its components (equations 1-5). It outlines the steps of the method in Section 3, but it does not present a structured, explicitly labeled pseudocode block or algorithm.
Open Source Code Yes Each experiments detailed in the present work can be reproduced using the following code: : https://github.com/jose-melo/t-jepa
Open Datasets Yes The datasets we include in our experiments are Adult (AD) (Kohavi et al., 1996), Higgs (HI) (Vanschoren et al., 2014), Helena (HE) (Guyon et al., 2019), Jannis (JA) (Guyon et al., 2019), ALOI (AL) (Geusebroek et al., 2005) and California housing (CA) (Pace and Barry, 1997). We also add MNIST (interpreted as a tabular data) to our benchmark following Yoon et al. (2020).
Dataset Splits Yes T-JEPA Training We split each dataset into training/validation/test sets (80/10/10) which were used for selecting both the hyperparameters of T-JEPA and of the models used for the downstream task.
Hardware Specification Yes The training was done on a single NVIDIA HGX A100 GPU with 40GB of memory.
Software Dependencies Yes Table 4: Main libraries used in the project. Library Description Python v3.12.2 The programming language used for the project einops v0.8.0 A flexible and powerful tool for tensor operations matplotlib v3.8.4 A library for creating static, animated, and interactive plots numpy v2.1.0 Fundamental package for scientific computing with arrays pandas v2.2.2 Data manipulation and analysis tool pytorch_lightning v2.2.1 A Py Torch wrapper for high-performance deep learning research scikit_learn v1.4.1.post1 Machine learning library for Python scipy v1.14.1 Library for scientific and technical computing torch v2.3.0.post301 Py Torch deep learning library torchinfo v1.8.0 Module to show model summaries in Py Torch tqdm v4.66.2 Progress bar utility for Python xgboost v2.1.1 Optimized gradient boosting library
Experiment Setup Yes We employed Bayesian optimization to tune the hyperparameters of T-JEPA. The batch size was fixed at 512 for all configurations, while the exponential moving average (EMA) decay rate was set to vary from 0.996 to 1. Additionally, we used four prediction masks throughout the training process. For optimization, we selected the Adam W optimizer (Loshchilov and Hutter, 2019) due to its proven robustness in large-scale models. The learning rate was adaptively adjusted using a cosine annealing scheduler (Loshchilov and Hutter, 2017), which gradually reduced it from the initial value to a minimum, ηmin = 0. Table 5: Hyperparameter Configuration for Bayesian Optimization