Geodesic Flow Kernels for Semi-Supervised Learning on Mixed-Variable Tabular Dataset
Authors: Yoontae Hwang, Yongjae Lee
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To rigorously evaluate GFTab, we curate a comprehensive set of 21 tabular datasets spanning various domains, sizes, and variable compositions. Our experimental results show that GFTab outperforms existing ML/DL models across many of these datasets, particularly in settings with limited labeled data. |
| Researcher Affiliation | Academia | 1University of Oxford 2Ulsan National Institute of Science and Technology (UNIST) EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology using natural language and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/Yoontae6719/Geodesic-Flow Kernels-for-Semi-Supervised-Learning-on-Mixed Variable-Tabular-Dataset |
| Open Datasets | Yes | We selected 21 datasets after carefully reviewing more than 4,000 datasets including Open ML (3,953 datasets), AMLB (Gijsbers et al. 2022) (71 datasets), and (Grinsztajn, Oyallon, and Varoquaux 2022)(22 datasets). |
| Dataset Splits | No | The paper mentions evaluating GFTab under conditions of "20% labeled training data" and "10% labeled setting," but it does not specify the overall training, validation, and test splits for the entire datasets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments. It mentions referring to Appendix B for other settings, but Appendix B is not included in the provided text. |
| Software Dependencies | No | The paper mentions using XGBoost (Chen and Guestrin 2016), Cat Boost (Prokhorenkova et al. 2018), and Optuna (Akiba et al. 2019) but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | The GFTab is based on semi-supervised learning so that it is trained to minimize LGFTab = Lsim+βLce. we compared the performance of the model across various ranges of β. As a result, β = 1.0 yielded the best balance. |