Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis

Authors: Weikai Li, Ding Wang, Zijian Ding, Atefeh Sohrabizadeh, Zongyue Qin, Jason Cong, Yizhou Sun

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments verify the effectiveness of the proposed hierarchical Mo E. We conduct extensive experiments on the largest HLS benchmark dataset, HLSyn (Bai et al. 2023), and the experiment results reveal the effectiveness of hierarchical Mo E. Table 1 shows the main results. In the main experiments, we use the second design of the high-level gating network as it performs better. We conduct both offline and online evaluations. In offline evaluation, we calculate the fine-tuned regression model s mean squared error (MSE) on the left-out data points in the target kernels for each regression objective and sum up the five objectives MSE. In the online evaluation, we do design space exploration (DSE) to search for good pragma designs.
Researcher Affiliation Academia Weikai Li, Ding Wang, Zijian Ding, Atefeh Sohrabizadeh, Zongyue Qin, Jason Cong, Yizhou Sun University of California, Los Angeles EMAIL, EMAIL, EMAIL
Pseudocode No The paper includes code snippets for illustration (e.g., 'Code snippet from the Covariance kernel') and mathematical formulations of the model (e.g., Equation 1, 2, 3, 4, 5, 6), but it does not contain any clearly labeled 'Pseudocode' or 'Algorithm' block for the proposed method. The steps are described in prose.
Open Source Code Yes We publicized our codes at https://github.com/weikai-li/Hierarchical Mo E.
Open Datasets Yes We use one of the most comprehensive benchmark datasets, HLSyn (Bai et al. 2023).
Dataset Splits Yes We want to mimic the domain transfer situation of having scarce but representative labeled data on the target kernels, so we use 50 data points per kernel to fine-tune our regression model, and roughly the same ratio of data points, 265 samples per kernel, to fine-tune the classification model.
Hardware Specification No The paper mentions that the model "needs about 32 GB of memory and can still fit into a single GPU," but does not specify the model of the GPU or any other specific hardware used for training or running experiments. It also mentions "Xilinx Alveo U200 FPGA" as the *target* hardware for HLS, not the hardware used to run the machine learning experiments.
Software Dependencies Yes We utilize the AMD/Xilinx HLS tool, Vitis 2021.1 (AMD/Xilinx 2020), to run HLS targeting the Xilinx Alveo U200 FPGA with a working frequency of 250MHz.
Experiment Setup Yes Based on our pre-explorations, we use 4 experts in the low-level Mo Es, and we set the regularization terms weights of both low-level and high-level Mo Es to 5e-3. In the first stage containing T epochs (warmup), we train the three expert models individually.