Pfeife: Automatic Pipeline Parallelism for PyTorch

Authors: Ho Young Jhoo, Chung-Kil Hur, Nuno P. Lopes

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Pfeife in three ways: (1) applicability of the approach, (2) accuracy of cost estimations, and (3) end-to-end performance comparison with existing frameworks. ... Table 1: Throughput comparison of pipeline parallelism (item/s).
Researcher Affiliation Collaboration 1Seoul National University, Republic of Korea 2INESC-ID / Instituto Superior T ecnico University of Lisbon, Portugal 3Furiosa AI, Republic of Korea.
Pseudocode Yes Algorithm 1 shows the pseudo-code. ... Algorithm 1 Graph-schedule co-optimization.
Open Source Code Yes Pfeife1 Available at https://github.com/Mer HS/pfeife.
Open Datasets Yes We used Torch Bench (Hao et al., 2023), which is the official Py Torch benchmark suite. It includes a wide range of models. ... Vision Transformer (Vi T-g/14) (Zhai et al., 2022) and GPT2-large (Radford et al., 2019) ... Llama2-7B) (Touvron et al., 2023), and a diffusion model (Stable Diffusion-XL) (Podell et al., 2023)
Dataset Splits No The paper refers to using datasets like Torch Bench, Vi T-g/14, Llama2-7B, and Stable Diffusion-XL. However, it does not explicitly provide details on how these datasets were split into training, validation, or test sets for the experiments conducted in this paper. It mentions "mini-batch size" and "total batch count" but not data partitioning for evaluation.
Hardware Specification Yes For coverage and correctness, we used a small server with 8x NVIDIA RTX 3090 24 Gi B GPUs with 4 NVLink connections. For the end-to-end experiments, we used a larger server with 8x A100 40GB GPUs with NVSwitch.
Software Dependencies Yes ML models are written in plain Py Torch. They are then compiled using Py Torch 2 s torch.compile (Ansel et al., 2024), as it is now common.
Experiment Setup Yes Listing 1 shows an example of the full code required to train a model with Pfeife... optimizer = torch.optim.Adam(main_model.parameters(), lr=1e-5) criterion = torch.nn.Cross Entropy Loss() ... (B) Total batch count: Number of mini-batches (Nl) Loop count: How many times the forward loop is executed. (Bl) Loop batch count: How many mini-batches go through the forward pass of a single stage. ( Bf) Prefetch batch count: A list with the number of forward passes each device runs in addition to Bl before it runs its first backward pass. | Bf| = |D|.