TabWak: A Watermark for Tabular Diffusion Models
Authors: Chaoyi Zhu, Jiayi Tang, Jeroen Galjaard, Pin-Yu Chen, Robert Birke, Cornelis Bos, Lydia Chen
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Tab Wak on five datasets against baselines to show that the quality of watermarked tables remains nearly indistinguishable from non-watermarked tables while achieving high detectability in the presence of strong post-editing attacks, with a 100% true positive rate at a 0.1% false positive rate on synthetic tables with fewer than 300 rows. |
| Researcher Affiliation | Collaboration | Chaoyi Zhu1, Jiayi Tang1, Jeroen Galjaard1, Pin-Yu Chen3, Robert Birke4, Cornelis Bos1,5, Lydia Y. Chen1,2 TU Delft1, University of Neuchˆatel2, IBM Research3, University of Turin4, Tata Steel Research5 EMAIL EMAIL EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 Tree-Ring Embedding in Tabular Data |
| Open Source Code | Yes | Our code is available at the following repository https://github.com/chaoyitud/Tab Wak. |
| Open Datasets | Yes | We used five widely utilized tabular datasets to evaluate the performance of the proposed Tab Wak on synthetic data quality, its effectiveness of watermark detection, and its robustness against post-editing attacks. These include: Shoppers (Sakar & Kastro, 2018), Magic (Bock, 2007), Credit (Yeh, 2016), Adult (Becker & Kohavi, 1996), and Diabetes (Strack et al., 2014). |
| Dataset Splits | No | The paper mentions splitting data for discriminability evaluation: "First, all rows from both the real and synthetic datasets are combined and then split into training and validation sets. The machine learning model is trained on the training set and evaluated on the validation sets." However, it does not specify explicit percentages or exact counts for these splits or for the training-on-synthetic, test-on-real setting, making reproducibility of the exact splits difficult. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as GPU models, CPU specifications, or memory, to run the experiments. It only mentions general computing environments like 'HPC' in the acknowledgements. |
| Software Dependencies | No | The paper mentions using a 'Tabsyn framework' and 'Denoising Diffusion Probabilistic Model (DDPM)' and 'Denoising Diffusion Implicit Models (DDIM)' as methodologies. It also references 'Transformer architecture' and 'multi-layer perceptron (MLP)'. However, it does not specify particular software packages or libraries with their version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions) that were used in the implementation. |
| Experiment Setup | Yes | More in detail for the model we used, the autoencoder module comprises an encoder and a decoder, each following a 2-layer Transformer architecture. The hidden dimension of the Transformer s feed-forward network (FFN) is set to 128. The diffusion model comprises a 4-layer multi-layer perceptron (MLP) with a hidden dimension of 1024. For both the diffusion and sampling processes within the diffusion model, 1000 timesteps are used. With these hyperparameters, the latent tabular model consistently generates high-quality synthetic data in the absence of watermarking, achieving similarity metrics above 0.88, discriminability metrics above 0.63, and utility metrics around 0.79 across all datasets. Therefore, the same architecture is employed for all four datasets, while the number of training epochs is tuned for each dataset individually. |