Preventing Conflicting Gradients in Neural Marked Temporal Point Processes

Authors: Tanguy Bosser, Souhaib Ben Taieb

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments on multiple real-world event sequence datasets, we demonstrate the benefits of our framework compared to the original model formulations. Through a series of experiments with real-world event sequence datasets, we show the advantages of our framework over the original model formulations.
Researcher Affiliation Academia Tanguy Bosser EMAIL Department of Computer Science University of Mons Souhaib Ben Taieb EMAIL Mohamed bin Zayed University of Artificial Intelligence University of Mons
Pseudocode No The paper describes methodologies using mathematical formulations and conceptual steps but does not include any explicitly labeled pseudocode blocks or algorithms.
Open Source Code Yes Additionally, all our experiments are reproducible and implemented using a common code base1. 1https://github.com/tanguybosser/grap TPP_tmlr
Open Datasets Yes We use five real-world marked event sequence datasets frequently referenced in the neural MTPP literature: Last FM (Hidasi & Tikk, 2012), MOOC, Reddit (Kumar et al., 2019), Github (Trivedi et al., 2019), and Stack Overflow (Du et al., 2016). We employ the pre-processed version of these datasets as described in (Bosser & Ben Taieb, 2023) which can be openly accessed at this url: https://www.dropbox.com/sh/maq7nju7v5020kp/ AAAFBvzxe Nqy SRs Am-zg U7s3a/processed/data?dl=0&subfolder_nav_tracking=1 (MIT License).
Dataset Splits Yes Each dataset is randomly partitioned into 3 train/validation/test splits (60%/20%/20%).
Hardware Specification Yes All models were trained on a machine equipped with an AMD Ryzen Threadripper PRO 3975WX CPU running at 4.1 GHz and a Nvidia RTX A4000 GPU.
Software Dependencies No Our framework is implemented in a unified codebase using Py Torch7. While PyTorch is mentioned, a specific version number is not provided, which is required for a reproducible software dependency description.
Experiment Setup Yes For all models, we minimize the average NLL in (17) on the training sequences using mini-batch gradient descent with the Adam optimizer (Kingma & Ba, 2014) and a learning rate of 10-3. For the base models and the base+ setup, an early-stopping protocol interrupts training if the model fails to show improvement in the total validation loss (i.e., LT + LT) for 50 consecutive epochs. In the base++ setup, two distinct early-stopping protocols are implemented for the LT and LM terms, respectively. The optimization process can last for a maximum of 500 epochs...the dimension de of the event encodings is set to 8. Additionally, we chose a value of M = 32 for the number of mixture components. ... we set the number of GCIF projections to C = 32.