reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Evolutionary Large Language Model for Automated Feature Transformation

Authors: Nanxu Gong, Chandan K Reddy, Wangyang Ying, Haifeng Chen, Yanjie Fu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we empirically demonstrate the effectiveness and generality of our proposed method. ... Our experimental results demonstrate the effectiveness and robustness of ELLM-FT. ... Table 1 shows the detailed statistics of the data sets. We adopted Random Forest (RF) as the downstream model. We used the F-1 score to measure the accuracy of classification tasks, and use the 1 relative absolute error (RAE) to measure the accuracy of regression tasks. We performed 5-fold stratified cross-validation to reduce random errors in experiments.
Researcher Affiliation	Collaboration	1Arizona State University, Tempe, USA 2Virginia Tech, Arlington, USA 3NEC Laboratories America, Princeton, USA
Pseudocode	Yes	Algorithm 1: RL-based data collection
Open Source Code	Yes	Code https://github.com/Nanxu Gong/ELLM-FT
Open Datasets	Yes	Data Descriptions. We collected 12 datasets from UCIrvine, Lib SVM, Kaggle, and Open ML. We evaluated our method and baseline methods on two major predictive tasks: 1) Classification (C); and 2) Regression (R). Table 1 shows the detailed statistics of the data sets.
Dataset Splits	Yes	We performed 5-fold stratified cross-validation to reduce random errors in experiments.
Hardware Specification	No	The paper does not provide specific hardware details like CPU/GPU models, processor types, or memory amounts used for running the experiments. It only mentions using 'Llama-2-13B-chat-hf' as the backbone LLM.
Software Dependencies	No	The paper mentions 'Llama-2-13B-chat-hf' as the backbone LLM and 'Random Forest (RF)' as the downstream model, but it does not provide specific version numbers for any software libraries, programming languages, or other dependencies.
Experiment Setup	No	The paper describes evaluation metrics (F-1 score, 1-RAE) and cross-validation (5-fold stratified cross-validation), but it does not provide specific numerical hyperparameters for the LLM (e.g., prompt iterations, specific number of few-shot examples, M and T for Algorithm 1), the RL data collector, or the downstream Random Forest model's configuration.