SysCaps: Language Interfaces for Simulation Surrogates of Complex Systems

Authors: Patrick Emami, Zhaonan Li, Saumya Sinha, Truc Nguyen

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on two real-world simulators of buildings and wind farms show that our Sys Caps-augmented surrogates have better accuracy on held-out systems than traditional methods while enjoying new generalization abilities, such as handling semantically related descriptions of the same test system. Additional experiments also highlight the potential of Sys Caps to unlock language-driven design space exploration and to regularize training through prompt augmentation.
Researcher Affiliation Academia Patrick Emami, Saumya Sinha , Truc Nguyen National Renewable Energy Lab EMAIL Zhaonan Li Arizona State University EMAIL
Pseudocode No The paper describes the methods and model architecture using narrative text, figures (Figure 1, Figure 2), and mathematical formulations, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes As there are no standard benchmarks for comparing surrogate modeling performance for CES, we open-source all code and data at https://github.com/NREL/SysCaps to facilitate future work.
Open Datasets Yes Building stock simulation data: For the main experiments in Section 6.1-6.4 we train building stock surrogate models for the building energy simulator Energy Plus (Crawley et al., 2001). ...We use commercial buildings from the Buildings-900K dataset (Emami et al., 2023b). ...This experiment uses the Wind Farm Wake Modeling Dataset (Ramos et al., 2023), made with the FLORIS simulator...
Dataset Splits Yes Our training set is comprised of 330K buildings, and we use 100 buildings for validation and 6K held-out buildings for testing. We also reserved a held-out set of 10K buildings for RFE. ... In this dataset, there are only 500 unique system configurations (split 3:1:1 for train, val, test), although each configuration is simulated under 500 distinct atmospheric conditions.
Hardware Specification Yes Generating these datasets with llama-2-7b-chat used 1.5K GPU hours on a cluster with 16 NVIDIA A100-40GB GPUs. ...All models are trained with a single NVIDIA A100-40GB GPU.
Software Dependencies No The paper mentions several software components, frameworks, and models such as 'llama-2-7b-chat (Touvron et al., 2023)', 'BERT (Devlin et al., 2018)', 'Distil BERT (Sanh et al., 2019)', 'Light GBM (Ke et al., 2017)', and 'Optuna (Akiba et al., 2019)'. However, it does not specify exact version numbers for these or other crucial software dependencies (e.g., Python, PyTorch/TensorFlow) required for reproducibility.
Experiment Setup Yes We carefully tune the hyperparameters of all models (details in Appendix A.2). See Table 6 for hyperparameter sweep details for the buildings experiments and Table 7 for hyperparameter sweep details for the wind farm experiments.