reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Dynamic and Adaptive Feature Generation with LLM

Authors: Xinhao Zhang, Jinghan Zhang, Banafsheh Rekabdar, Yuanchun Zhou, Pengfei Wang, Kunpeng Liu

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct a series of experiments to validate the effectiveness and robustness of our method across different datasets and downstream tasks. Our results demonstrate that our method has clear advantages over existing methods and considerable potential to promote a broader range of feature engineering tasks. In this section, we present three experiments to demonstrate the effectiveness of the LFG. First, we compare LFG against several baseline methods on multiple downstream classification tasks. Second, we perform a robustness check on LFG s performance improvement. Finally, we further study and analyze iterative performance improvements in experimental results and discuss their reasons.
Researcher Affiliation	Academia	1Portland State University 2Computer Network Information Center, Chinese Academy of Sciences 3University of Chinese Academy of Sciences, Chinese Academy of Sciences EMAIL, EMAIL
Pseudocode	No	The paper includes figures (Figure 2, Figure 3) illustrating the workflow and MCTS process, but these are diagrams, not structured pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions using the "Open AI API, GPT-3.5 Turbo model [Open AI, 2024]" for experiments, which refers to a third-party tool. There is no explicit statement about the authors releasing their own source code for the methodology described, nor is there a link to a code repository.
Open Datasets	Yes	We evaluate the LFG method on four real-world datasets from UCI, including Ionosphere (Ino) [Sigillito et al., 1989], Amazon Commerce Reviews (Ama) [Liu, 2011], and Abalone (Aba) [Nash et al., 1995], as well as Diabetes Health Indicators Dataset (Dia) [Teboul, 2022] from Kaggle. The detailed information is shown in Table 1.
Dataset Splits	Yes	For each dataset, we randomly selected 55% of the data for training. We validate the experiments through five-fold cross-validation.
Hardware Specification	No	The paper mentions performing experiments on the "Open AI API, GPT-3.5 Turbo model [Open AI, 2024]". This indicates the use of OpenAI's cloud-based service, but does not provide any specific hardware details (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using the "Open AI API, GPT-3.5 Turbo model [Open AI, 2024]" and classification models like Random Forests (RF), Decision Tree (DT), K-Nearest Neighbor (KNN), and Multilayer Perceptrons (MLP). However, it does not specify any version numbers for these software components or other libraries/frameworks used.
Experiment Setup	Yes	Here, we set the same operation set for both LFG and RL method to consist of square root, square, cosine, sine, tangent, exp, cube, log, reciprocal, sigmoid, plus, subtract, multiply, and divide. For our LLM, we perform all the experiments on the Open AI API, GPT-3.5 Turbo model [Open AI, 2024]. We compare the performance outcomes in these models both with and without our method. Specifically, we show the results of LFG in 3 iterations of generation, denoted as LFG-3, and compare it with the full LFG in T 10. We validate the experiments through five-fold cross-validation.