reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Dual-Agent Reinforcement Learning for Automated Feature Generation

Authors: Wanfu Gao, Zengyao Man, Hanlin Pan, Kunpeng Liu

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental results on multiple datasets demonstrate that the proposed method is effective. We conduct experiments on 21 datasets from UCI [Public, 2024b], Kaggle [Howard, 2024], and Open ML [Public, 2024a], Lib SVM [Lin, 2024], comprising 12 classification tasks and 9 regression tasks.
Researcher Affiliation	Academia	1College of Computer Science and Technology, Jilin University, China 2Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, China 3Department of Computer Science, Portland State University, Portland, OR 97201 USA
Pseudocode	Yes	Pseudocode of the DARL, experimental settings, comparison of different downstream task and convergence analysis are presented in Appendix.
Open Source Code	Yes	The code is available at https://github.com/extess0/DARL.
Open Datasets	Yes	We conduct experiments on 21 datasets from UCI [Public, 2024b], Kaggle [Howard, 2024], and Open ML [Public, 2024a], Lib SVM [Lin, 2024], comprising 12 classification tasks and 9 regression tasks.
Dataset Splits	Yes	We adopt random forest as the downstream machine learning model and performed 5-fold stratified cross-validation in all experiments, instead of a simple 70%-30% split.
Hardware Specification	Yes	All experiments are conducted on the Ubuntu operating system, Intel(R) Core(TM) i9-10900X CPU@ 3.70GHz, and V100, with the framework of Python 3.10.12 and Py Torch 1.13.1.
Software Dependencies	Yes	All experiments are conducted on the Ubuntu operating system, Intel(R) Core(TM) i9-10900X CPU@ 3.70GHz, and V100, with the framework of Python 3.10.12 and Py Torch 1.13.1.
Experiment Setup	Yes	The number of epochs is limited to 200. By using 6 exploration steps per epoch, we further control the number of features generated. We adopt random forest as the downstream machine learning model and performed 5-fold stratified cross-validation in all experiments, instead of a simple 70%-30% split. We used the Adam [Kingma and Ba, 2015] optimizer with a learning rate of 0.0001 to optimize DQN, and set the memory limit of experience replay to 24, and the DQN batch size to 8. The model incorporated 8 attention heads, with a word embedding vector dimension of 8 and a model hidden layer dimension of 128. The discrimination agent s reward weights α, β, γ, and δ are set to 0.1, 0.1, 1, and 0.01.