Evolutionary Large Language Model for Automated Feature Transformation

Authors: Nanxu Gong, Chandan K Reddy, Wangyang Ying, Haifeng Chen, Yanjie Fu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we empirically demonstrate the effectiveness and generality of our proposed method. ... Our experimental results demonstrate the effectiveness and robustness of ELLM-FT. ... Table 1 shows the detailed statistics of the data sets. We adopted Random Forest (RF) as the downstream model. We used the F-1 score to measure the accuracy of classification tasks, and use the 1 relative absolute error (RAE) to measure the accuracy of regression tasks. We performed 5-fold stratified cross-validation to reduce random errors in experiments.
Researcher Affiliation Collaboration 1Arizona State University, Tempe, USA 2Virginia Tech, Arlington, USA 3NEC Laboratories America, Princeton, USA
Pseudocode Yes Algorithm 1: RL-based data collection
Open Source Code Yes Code https://github.com/Nanxu Gong/ELLM-FT
Open Datasets Yes Data Descriptions. We collected 12 datasets from UCIrvine, Lib SVM, Kaggle, and Open ML. We evaluated our method and baseline methods on two major predictive tasks: 1) Classification (C); and 2) Regression (R). Table 1 shows the detailed statistics of the data sets.
Dataset Splits Yes We performed 5-fold stratified cross-validation to reduce random errors in experiments.
Hardware Specification No The paper does not provide specific hardware details like CPU/GPU models, processor types, or memory amounts used for running the experiments. It only mentions using 'Llama-2-13B-chat-hf' as the backbone LLM.
Software Dependencies No The paper mentions 'Llama-2-13B-chat-hf' as the backbone LLM and 'Random Forest (RF)' as the downstream model, but it does not provide specific version numbers for any software libraries, programming languages, or other dependencies.
Experiment Setup No The paper describes evaluation metrics (F-1 score, 1-RAE) and cross-validation (5-fold stratified cross-validation), but it does not provide specific numerical hyperparameters for the LLM (e.g., prompt iterations, specific number of few-shot examples, M and T for Algorithm 1), the RL data collector, or the downstream Random Forest model's configuration.