Architecture-Aware Learning Curve Extrapolation via Graph Ordinary Differential Equation
Authors: Yanna Ding, Zijie Huang, Xiao Shou, Yihang Guo, Yizhou Sun, Jianxi Gao
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate its ability to capture the general trend of fluctuating learning curves while quantifying uncertainty through variational parameters. Our model outperforms current state-of-the-art learning curve extrapolation methods and pure time-series modeling approaches for both MLP and CNN-based learning curves. Additionally, we explore the applicability of our method in Neural Architecture Search scenarios, such as training configuration ranking. We showcase LC-GODE s ability to forecast model performance across diverse Auto ML benchmarks. First, we compare it to six learning curve extrapolation methods on real-world datasets using stochastic gradient descent for tabular and image tasks, training each source task separately. Next, we evaluate its effectiveness in ranking training configurations by predicted optimal performance. Finally, we analyze model sensitivity to architecture variants, time-series encoders, and hyperparameters. |
| Researcher Affiliation | Academia | Yanna Ding1, Zijie Huang2, Xiao Shou3, Yihang Guo2, Yizhou Sun2, Jianxi Gao1 1Rensselaer Polytechnic Institute 2University of California, Los Angeles 3Baylor University EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology using mathematical equations and text but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code and supplementary materials are publicly available1. 1https://github.com/dingyanna/LC-GODE.git |
| Open Datasets | Yes | Specifically for MLPs, we use car and segment tabular data binary classification tasks from Open ML (Vanschoren et al. 2014) as source tasks. For CNN-based models, we employ the NAS-Bench-201 dataset (Dong and Yang 2020), which provides comprehensive learning curves for each architecture over a span of 200 epochs across two image datasets: CIFAR-10, CIFAR-100 (Krizhevsky, Hinton et al. 2009). |
| Dataset Splits | Yes | Furthermore, we reserve 20% of all trials as the test set for each MLP source task and 25% for each CNN source task. |
| Hardware Specification | No | The paper mentions "Training neural architectures is a resource-intensive endeavor, often demanding considerable computational power and time" but does not specify any particular hardware (e.g., GPU, CPU models, or memory) used for running the experiments. |
| Software Dependencies | No | The paper describes the use of various software components and methods (e.g., RNN with GRU units, GCN layers, Runge-Kutta 4 method), but it does not specify any version numbers for these or other software dependencies. |
| Experiment Setup | Yes | The condition length is set to 10 epochs for all methods in this experiment. The instantiation of LC-GODE that we report features: (i) an architecture encoder that utilizes 2 layers and employs a learnable pooling technique, (ii) an observed time-series encoder implemented using GRU, (iii) an ODE function with a 2-layer MLP and integrated using the Runge-Kutta 4 method (Butcher 1996). The model is trained using early stopping if no improvement is observed after 50 epochs. We analyze the impact of three hyperparameters: maximal time Tmax, observation length, and hidden dimension (Figure 6). |