Poplar: Efficient Scaling of Distributed DNN Training on Heterogeneous GPU Clusters
Authors: WenZheng Zhang, Yang Hu, Jing Shi, Xiaoying Bai
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on three heterogeneous clusters, comprising six different types of GPUs, demonstrate that Poplar achieves a training throughput improvement of 1.02 3.92x over current state-of-the-art heterogeneous training systems. |
| Researcher Affiliation | Academia | 1School of Computer Science, Peking University 2Center for Information Research, Academy of Military Sciences 3Advanced Institute of Big Data EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Heterogeneity Aware of each GPU |
| Open Source Code | No | We will publish all source codes of this work on Github for further research explorations. |
| Open Datasets | Yes | All experiments are evaluated on wikitext2-v1 dataset(Merity et al. 2016). |
| Dataset Splits | No | All experiments are evaluated on wikitext2-v1 dataset(Merity et al. 2016). |
| Hardware Specification | Yes | Our experiments are conducted on three heterogeneous GPU clusters, each cluster contains two types of GPUs, as shown in Table 1. ... A100 80GB A100 40GB ... V100 16GB T4 16GB ... A800 80GB V100S 32GB |
| Software Dependencies | No | We have implemented our work on Py Torch with around 2000+ lines of code. |
| Experiment Setup | Yes | We maintain a global batch size of 2 million tokens throughout our experiments. |