Talent: A Tabular Analytics and Learning Toolbox
Authors: Si-Yang Liu, Hao-Run Cai, Qi-Le Zhou, Huai-Hong Yin, Tao Zhou, Jun-Peng Jiang, Han-Jia Ye
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Talent includes over 35 deep tabular prediction methods, offering various encoding and normalization modules, all within a unified, easily extensible interface. We demonstrate its design, application, and performance evaluation in case studies. The code is available at https://github.com/LAMDA-Tabular/TALENT. ... As an application of Talent, we conducted fair comparisons of representative methods on 300 benchmark datasets (Ye et al., 2024a), as detailed in Appendix D. |
| Researcher Affiliation | Academia | Si-Yang Liu EMAIL Hao-Run Cai EMAIL Qi-Le Zhou EMAIL Huai-Hong Yin EMAIL Tao Zhou EMAIL Jun-Peng Jiang EMAIL Han-Jia Ye EMAIL School of Artificial Intelligence, Nanjing University, China National Key Laboratory for Novel Software Technology, Nanjing University, 210023, China |
| Pseudocode | Yes | 1 from tqdm import tqdm 2 from TALENT.model.utils import ( 3 get_deep_args,show_results,tune_hyper_parameters, get_method,set_seeds) 4 from TALENT.model.lib.data import get_dataset 5 args, default_para, opt_space = get_deep_args() 6 train_val_data, test_data, info = get_dataset(args.dataset, args.dataset_path) 7 if args.tune: 8 args = tune_hyper_parameters(args, opt_space, train_val_data, info) 9 for seed in tqdm(range(args.seed_num)): 10 args.seed = seed 11 method = get_method(args.model_type)(args, info["task_type"] == "regression") 12 time_cost = method.fit(train_val_data, info, train=True) 13 vres, metric_name, predict_logits = method.predict(test_data, info) |
| Open Source Code | Yes | We introduce Talent (Tabular Analytics and LEar Ning Toolbox), a versatile toolbox for utilizing, analyzing, and comparing these methods. ... The code is available at https://github.com/LAMDA-Tabular/TALENT. |
| Open Datasets | Yes | As an application of Talent, we conducted fair comparisons of representative methods on 300 benchmark datasets (Ye et al., 2024a), as detailed in Appendix D. ... The datasets are available at Google Drive. ... This example illustrates how users can rapidly prototype and evaluate models using familiar tools and APIs, thereby enhancing the usability and accessibility of the Talent framework for researchers and practitioners alike. 1 import numpy as np 2 import openml |
| Dataset Splits | Yes | The get dataset function loads the specified dataset from the provided path, splits it into training/validation and test sets, and provides additional information about the dataset. ... The benchmark covers 300 tabular datasets (Ye et al., 2024a) drawn from diverse domains such as finance, education, and physics, encompassing binary classification, multi-class classification, and regression tasks. These datasets exhibit substantial variability in both the number of samples and the number of features, ensuring a broad assessment across different data characteristics. The detailed statistics are provided in Figure 4. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | Talent leverages open-source libraries to support its advanced data processing and machine learning functionalities, following the organized code structure introduced by RTDL (Gorishniy et al., 2021). For model optimization and hyperparameter tuning, it utilizes Optuna (Akiba et al., 2019). These carefully chosen dependencies offer users a powerful, flexible, and efficient toolkit for addressing various challenges in tabular data analysis. |
| Experiment Setup | Yes | It begins with data loading, followed by preprocessing, hyperparameter tuning, model training, prediction, and ultimately evaluation. This structured workflow ensures a smooth transition from raw data to meaningful results. ... If hyperparameter tuning is enabled, the tune hyper parameters function adjusts the arguments based on the optimization space and the training/validation data. ... Classification tasks are evaluated using metrics like Accuracy, F1-Score, Log Loss, and AUC, while regression tasks are assessed with MAE, RMSE, and R2 (Lewis-Beck, 2015). |