Unlocking Point Processes through Point Set Diffusion

Authors: David Lüdke, Enric Rabasseda Raventós, Marcel Kollovieh, Stephan Günnemann

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on synthetic and real-world datasets demonstrate that POINT SET DIFFUSION achieves state-of-the-art performance in unconditional and conditional generation of spatial and spatiotemporal point processes while providing up to orders of magnitude faster sampling.
Researcher Affiliation Academia David L udke , Enric Rabasseda Ravent os , Marcel Kollovieh, Stephan G unnemann Department of Informatics & Munich Data Science Institute Technical University of Munich, Germany EMAIL
Pseudocode Yes Algorithm 1 Conditional sampling Require: Xc 0 = C(X0) 1: XT λϵ 2: for t = T, . . . , 1 do 3: e X0 pθ(X0|Xt) 4: e Xt 1 q(Xt 1| e X0, Xt) (reverse 3.2) 5: Xc t 1 q(Xc t 1|Xc 0) (forward 3.1) 6: Xt 1 = C ( e Xt 1) C(Xc t 1) 7: end for 8: return C (X0)
Open Source Code Yes Code is available at https://www.cs.cit.tum.de/daml/point-set-diffusion
Open Datasets Yes We evaluate our model on four benchmark datasets with their proposed pre-processing and splits: three real-world datasets Japan Earthquakes (U.S. Geological Survey, 2024), New Jersey COVID-19 Cases (The New York Times, 2024), and Citibike Pickups (Citi Bike, 2024) and one synthetic dataset, Pinwheel, based on a multivariate Hawkes process (Soni, 2019).
Dataset Splits Yes We follow Chen et al. (2021) and evaluate our model on four benchmark datasets with their proposed pre-processing and splits
Hardware Specification Yes All models have been trained on an NVIDIA A100-PCIE-40GB.
Software Dependencies No We use Adam as the optimizer and a fixed weight decay of 0.0001 to avoid overfitting.
Experiment Setup Yes Hyperparameters: We use the same hyperparameters for all datasets and types of point processes. In a hyperparameter study A.8, we have found T = 100 for our cosine noise schedule (Nichol et al., 2021) to give a good trade off between sampling time and quality. Further, we leverage a hidden dimension and embedding size of 32. For training, we use a batch size of 128 and a learning rate of 0.001. We use Adam as the optimizer and a fixed weight decay of 0.0001 to avoid overfitting. To avoid exploding gradients, we clip the gradients to have a norm lower than 2.