LEKA: LLM-Enhanced Knowledge Augmentation

Authors: Xinhao Zhang, Jinghan Zhang, Fengran Mo, Dongjie Wang, Yanjie Fu, Kunpeng Liu

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate the effectiveness of our approach through extensive experiments across various domains and demonstrate significant improvements over traditional methods in automating data alignment and optimizing transfer learning outcomes. We conduct a series of experiments to validate the effectiveness and robustness of our LEKA method across different tasks. Experimental results demonstrate that our method has clear advantages over existing methods. In this section, we present four experiments to demonstrate the effectiveness and impacts of the LEKA. First, we compare the LEKA against several baseline methods on four downstream tasks. Second, we present the correlations between several target domains and their retrieved source domains. Finally, we discuss the reason for performance improvement. We evaluate our method on four datasets of medical and economic domains: (1) Breast Cancer Wisconsin (Diagnostic) (BCW) [Wolberg et al., 1995], (2) Heart Disease (HD) [Janosi et al., 1989], (3) Vehicle Insurance Data (VID) [Bhatt, 2019], and (4) Telco Customer Churn (TCC) [Blast Char, 2018]. We show the detailed information about the features of the datasets in Table 1. We evaluate the model performance by the following metrics: Overall Accuracy (Acc) measures the proportion of true results (both true positives and true negatives) in the total dataset. Precision (Prec) reflects the ratio of true positive predictions to all positive predictions for each class. Recall (Rec), also known as sensitivity, reflects the ratio of true positive predictions to all actual positives for each class. F-Measure (F1) is the harmonic mean of precision and recall, calculated here as the macro-average. We apply the LEKA across a range of models: 1) Tabnet (TN) [Arik and Pfister, 2021]; 2) Tab Transformer (TT) [Huang et al., 2020]; 3) Random Forest (RF) [Rigatti, 2017]; 4) Gradient Boosting Decision Trees (GBDT) [Lin et al., 2023] ; 5) XGBoost (XB) [Chen and Guestrin, 2016]. We compare the performance in these tasks both with and without our method.
Researcher Affiliation Academia Xinhao Zhang1 , Jinghan Zhang1 , Fengran Mo2 , Dongjie Wang3 , Yanjie Fu4 and Kunpeng Liu1 1Portland State University, USA 2University of Montreal, Canada 3University of Kansas, USA 4Arizona State University, USA
Pseudocode No The paper describes the methodology in text and uses a framework diagram (Figure 2) but does not include any explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code No The paper does not explicitly state that the source code for the LEKA methodology is available, nor does it provide a link to a code repository. It mentions using GPT-4o and Exa API for query generation and data fetching, but this refers to third-party tools, not the authors' own implementation code for LEKA.
Open Datasets Yes We evaluate our method on four datasets of medical and economic domains: (1) Breast Cancer Wisconsin (Diagnostic) (BCW) [Wolberg et al., 1995], (2) Heart Disease (HD) [Janosi et al., 1989], (3) Vehicle Insurance Data (VID) [Bhatt, 2019], and (4) Telco Customer Churn (TCC) [Blast Char, 2018]. We show the detailed information about the features of the datasets in Table 1. ... In our setup for data synthesis and model training, we utilize GPT-4o [Open AI, 2024] as the query generator, combined with the Exa API [Exa, 2024] to fetch web pages containing datasets from Kaggle [Kaggle, 2024] and the UCI Machine Learning Repository [University of California, Irvine, 2024] that may be suitable for knowledge transfer.
Dataset Splits No The paper mentions batch sizes for training and the number of epochs: 'For our models, we configure TN, TT, and FTT with a batch size of 512 for the VID and TCC datasets and a batch size of 32 for the BCW dataset, a maximum of 100 epochs, and employ early stopping with a patience of 20.' However, it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for any of the datasets used.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions several software components like 'GPT-4o [Open AI, 2024]', 'Exa API [Exa, 2024]', and 'pytorch tabnet', but it does not specify concrete version numbers for these or any other libraries, frameworks, or programming languages used in the implementation.
Experiment Setup Yes For our models, we configure TN, TT, and FTT with a batch size of 512 for the VID and TCC datasets and a batch size of 32 for the BCW dataset, a maximum of 100 epochs, and employ early stopping with a patience of 20. The learning rate is set at the default 0.02 for pytorch tabnet. For the RF and GBDT models, the number of trees is set to 100, with GBDT also configured with a learning rate of 0.1 and a max depth of 3. TTab is set with a maximum of 50 epochs, a learning rate of 1 10 3, and a weight decay of 1 10 4.