Large Language Models Synergize with Automated Machine Learning
Authors: Jinglue Xu, Jialong Li, Zhen Liu, NAV Suryanarayanan, Guoyuan Zhou, JIA GUO, Hitoshi Iba, Kenji Tei
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments across various ML tasks, our method outperforms existing methods in 10 out of 12 tasks for generating ML programs. In addition, auto ML significantly improves the performance of the generated ML programs. In experiments, given the textual task description, our method, Text-to-ML, generates the complete and optimized ML program in a fully autonomous process. |
| Researcher Affiliation | Academia | Jinglue Xu EMAIL University of Tokyo Jialong Li EMAIL Tokyo Institute of Technology Zhen Liu EMAIL University of Tokyo Nagar Anthel Venkatesh Suryanarayanan EMAIL University of Tokyo Guoyuan Zhou EMAIL Hosei University Institute of Integrated Science and Technology Jia Guo EMAIL Hosei University Institute of Integrated Science and Technology Hitoshi Iba EMAIL University of Tokyo Kenji Tei EMAIL Tokyo Institute of Technology |
| Pseudocode | Yes | Algorithm 1 Contextual Modular Generation Algorithm 2 Text-to-ML Algorithm 3 Constrained Generative Unit Testing |
| Open Source Code | Yes | The implementation of our method is available at https://github.com/JLX0/llm-automl. |
| Open Datasets | Yes | Boston The Boston dataset is a widely used regression dataset that contains information about housing in the suburbs of Boston. It consists of 506 samples, each representing a suburb, with 13 features describing various aspects of the housing environment, such as crime rate, average number of rooms, and accessibility to highways. The target variable is the median value of owner-occupied homes in thousands of dollars. This dataset is often used for regression tasks to predict housing prices based on the given features. We obtain the files of the dataset from Altavish (2023). Iris The Iris dataset is a classic classification dataset used to demonstrate various machine learning algorithms and techniques. It contains 150 samples of iris flowers from three different species: setosa, versicolor, and virginica. There are four features for each flower: sepal length, sepal width, petal length, and petal width. The goal is to classify the flowers into their respective species based on these features, making it a popular dataset for teaching and practicing classification algorithms. We obtain the files of the dataset from Learning (2023). CIFAR-10 The CIFAR-10 dataset is a well-known benchmark for image classification tasks. It consists of 60,000 32x32 color images across 10 different classes, with 6,000 images per class. The classes include objects like airplanes, cars, birds, cats, and more. The dataset is divided into a training set of 50,000 images and a test set of 10,000 images, making it suitable for evaluating the performance of various image classification algorithms. We obtain the files of the dataset from Krizhevsky et al. (2023). IMDb Reviews The IMDB dataset is often used for sentiment analysis and text classification tasks. ... We obtain the files of the dataset from Lakshmi25npathi (2023). AG News The AG News dataset is commonly used for text classification tasks, particularly for news categorization. ... We obtain the files of the dataset from Rai (2023). |
| Dataset Splits | Yes | CIFAR-10 The CIFAR-10 dataset is a well-known benchmark for image classification tasks. It consists of 60,000 32x32 color images across 10 different classes, with 6,000 images per class. The dataset is divided into a training set of 50,000 images and a test set of 10,000 images, making it suitable for evaluating the performance of various image classification algorithms. We obtain the files of the dataset from Krizhevsky et al. (2023). |
| Hardware Specification | Yes | For deep learning tasks, we train each model on 1 of 4 NVIDIA A100 GPUs. |
| Software Dependencies | No | In our experiments, we utilize the Python libraries Pytorch, Py Torch Lightning, and Transformers for deep learning tasks and Scikit-learn, XGBoost, Cat Boost, and Light GBM for tasks involving tabular data. |
| Experiment Setup | Yes | For BOHB, we use the default hyperparameters in Falkner et al. (2018) and set 30 epochs as the maximum budget for deep learning tasks. ... Table 5: Finetuning strategy, Name Type, Scale Range Batch size int, log [2, 64] Learning rate float, log [10 5, 10 1] Weight decay float, log [10 4, 10 1] Momentum float [0.01, 0.99] Optimizer cat {SGD, Adam, Adam W} Scheduler cat {plateau, cosine} |