Broadband Ground Motion Synthesis by Diffusion Model with Minimal Condition

Authors: Jaeheun Jung, Jaehyuk Lee, Chang-Hae Jung, Hanyoung Kim, Bosung Jung, Donghun Lee

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present High-fidelity Earthquake Groundmotion Generation System (HEGGS) and demonstrate its superior performance using earthquakes from North American, East Asian, and European regions. HEGGS exploits the intrinsic characteristics of earthquake dataset and learns the waveforms using an end-to-end differentiable generator containing conditional latent diffusion model and hi-fidelity waveform construction model. We show the learning efficiency of HEGGS by training it on a single GPU machine and validate its performance using earthquake databases from North America, East Asia, and Europe, using diverse criteria from waveform generation tasks and seismology. Once trained, HEGGS can generate three dimensional E-N-Z seismic waveforms with accurate P/S phase arrivals, envelope correlation, signal-to-noise ratio, GMPE analysis, frequency content analysis, and section plot analysis.
Researcher Affiliation Academia 1Department of Mathematics, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul, Republic of Korea. Correspondence to: Donghun Lee <EMAIL>.
Pseudocode Yes After training diffusion model with HEGGS, we generate waveform with conventional reverse process by setting ztgt T by Gaussian noise or Zsrc. The details with pseudocode of training and generation, can be found in Appendix J. [...] Algorithm 1 HEGGS training [...] Algorithm 2 Generation
Open Source Code No The paper does not explicitly state that the authors are releasing their source code, nor does it provide a link to a code repository for the methodology described.
Open Datasets Yes We exploit this intrinsic pair-ability of the seismic waveform datasets, and construct paired waveform-metadata datasets from three earthquake databases from different continents: INSTANCE (Michelini et al., 2021) from Europe, KMA (Han et al., 2023) from East Asia, and SCEDC (SCEDC, 2013) from North America.
Dataset Splits Yes We split each dataset into training dataset and test dataset, according to the earthquake event, to evaluate the fidelity of generated waveform for the earthquake which is unseen during the training. [...] Table 4. Features of each dataset Dataset SCEDC KMA INSTANCE Features Train Test Train Test Train Test #observations 71,488 17,878 237,755 58,925 72,904 19,872 #source event 2,098 525 2,052 514 2,265 593 #station 149 149 134 134 578 534 average #station per events 34.07 34.05 115.87 114.64 24.43 25.29 average magnitude 2.45 2.45 1.45 1.45 3.36 3.36 average epicentral distance 125.25 126.71 235.48 234.22 57.82 57.79 average focus depth 8.51 8.65 11.52 11.73 12.47 11.97
Hardware Specification Yes We implement using single NVIDIA-RTX A6000 with 48GB memory.
Software Dependencies No The paper mentions several software components and tools like "Adam W optimizer", "pytorch.optim defaults", "U-Net backbone", "ACM", "MP-Se Net (Lu et al., 2023)", "EQTransformer (Mousavi et al., 2020) provided by Seis Bench (Woollam et al., 2022)", and "pynga (Wang, 2012) implementation". However, specific version numbers for key libraries like PyTorch or the full software environment are not provided.
Experiment Setup Yes For training, we set the number of epochs to 500 and the training batch size to 4. To enhance training efficiency, we apply an accumulation step 4, resulting in an effective batch size of 16. For the loss, we set the maximum diffusion steps to T = 1000 and SNR weight 5. We minimize the loss by Adam W optimizer with learning rate 10 5 and pytorch.optim defaults. During the training, we applied learning rate decaying technique with linear scheduler.