reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Exponential Family Variational Flow Matching for Tabular Data Generation

Authors: Andrés Guzmán-Cordero, Floor Eijkelboom, Jan-Willem Van De Meent

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluation on tabular data benchmarks demonstrates state-of-the-art performance compared to baselines. To demonstrate the effectiveness of EF-VFM, we introduce Tabby Flow, a model that achieves state-of-the-art performance on standard tabular benchmarks, improving both fidelity and diversity in synthetic data generation. We assess the quality of the synthetic data from four perspectives using a set of metrics widely adopted in previous studies (Zhang et al., 2024; Shi et al., 2024).
Researcher Affiliation	Collaboration	1Vector Institute 2Bosch-Delta Lab. Correspondence to: Andr es Guzm an-Cordero <EMAIL>, Floor Eijkelboom <EMAIL>.
Pseudocode	No	The paper does not contain explicitly labeled pseudocode or algorithm blocks. Methodological steps are described in prose.
Open Source Code	No	The paper does not provide concrete access to source code (e.g., a specific repository link, an explicit code release statement, or mention of code in supplementary materials).
Open Datasets	Yes	We use six tabular datasets from UCI Machine Learning Repository: Adult, Default, Shoppers, Magic, Beijing, and News, where each tabular dataset is associated with a machine-learning task. The statistics of the datasets are presented in Table 9. (Footnote 2: https://archive.ics.uci.edu/datasets)
Dataset Splits	Yes	Following Kotelnikov et al. (2023); Zhang et al. (2024); Shi et al. (2024), we split each dataset into real and test sets. For unconditional generation tasks, models are trained and evaluated on the real set. For machine learning efficiency evaluation, we further split the real set into training and validation sets, while using the test set for final evaluation. The statistics of the datasets are presented in Table 9, including # Train, # Validation, and # Test.
Hardware Specification	Yes	We perform our experiment on an Nvidia RTX A6000 GPU with 16GB of memory and implement TABBYFLOW with Py Torch.
Software Dependencies	No	The paper mentions implementing TABBYFLOW with Py Torch and using Quantile Transformer (from scikit-learn), but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	We keep the same training configuration across all datasets. All models train for 8,000 iterations using the Adam optimizer, with batch sizes of 4,096 during training and 10,000 during sampling. Similar to Shi et al. (2024), using a weighting scheme that keeps the categorical loss constant, while gradually reducing the numerical loss weight from one down to zero throughout training, works best.