reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Improving Parallel Program Performance with LLM Optimizers via Agent-System Interfaces

Authors: Anjiang Wei, Allen Nie, Thiago S. F. X. Teixeira, Rohan Yadav, Wonchan Lee, Ke Wang, Alex Aiken

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that mappers optimized by LLM-powered agents not only match but often surpass expert-written mappers, achieving up to 1.34 speedup across nine benchmarks. ... Empirical Evaluation of Performance: Our agent-based solution achieves up to 1.34 speedup across nine benchmarks
Researcher Affiliation	Collaboration	1Stanford University 2Intel 3NVIDIA 4Nanjing University.
Pseudocode	Yes	We show how we use Trace to incorporate the feedback from the execution to update the agent, with a Pytorch-like syntax. (Figure A2) High-level structure of the Trace-based agent template, where functions annotated with @bundle(trainable=True) define the search space that the LLM optimizer updates during mapper generation. (Figure A3)
Open Source Code	No	The paper does not contain a specific link to the source code for the methodology described, nor does it explicitly state that the code is being released. It references the 'Trace' framework but not its own implementation.
Open Datasets	Yes	Our evaluation utilizes a suite of 9 benchmarks, including 3 scientific computing workloads and 6 well-known matrix multiplication algorithms. Circuit is a simulation benchmark that models electrical circuit behavior by simulating currents and voltages across interconnected nodes and wires (Bauer et al., 2012). Stencil simulates a 2D grid where each point s value is updated based on a stencil pattern determined by its neighbors (Van der Wijngaart & Mattson, 2014). Pennant models unstructured mesh Lagrangian staggered-grid hydrodynamics, commonly used for simulating compressible flow (Ferenbaugh, 2015).
Dataset Splits	No	The paper focuses on performance optimization of mappers for parallel programs on benchmarks, rather than traditional machine learning tasks involving training, validation, and test datasets. Therefore, specific dataset split information is not applicable or provided.
Hardware Specification	Yes	Experiments are conducted on one node with two Intel 10-core E5-2640 v4 CPUs, 256G main memory, and four NVIDIA Tesla P100 GPUs.
Software Dependencies	No	The paper mentions using 'gpt-4o-2024-08-06' and the 'Trace' framework, but does not specify version numbers for general programming languages (like Python) or other libraries required to replicate the experiments.
Experiment Setup	Yes	running 10 iterations per application. To account for stochastic output, we repeated the process 5 times and report the average. ... The agent takes two inputs: server specifications and application metadata. Server specifications detail the hardware configuration, including the number of CPUs and GPUs per node, as well as the total node count. Application metadata provides information on task names and the associated data arguments accessed by each task.