Preventing Conflicting Gradients in Neural Marked Temporal Point Processes
Authors: Tanguy Bosser, Souhaib Ben Taieb
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments on multiple real-world event sequence datasets, we demonstrate the benefits of our framework compared to the original model formulations. Through a series of experiments with real-world event sequence datasets, we show the advantages of our framework over the original model formulations. |
| Researcher Affiliation | Academia | Tanguy Bosser EMAIL Department of Computer Science University of Mons Souhaib Ben Taieb EMAIL Mohamed bin Zayed University of Artificial Intelligence University of Mons |
| Pseudocode | No | The paper describes methodologies using mathematical formulations and conceptual steps but does not include any explicitly labeled pseudocode blocks or algorithms. |
| Open Source Code | Yes | Additionally, all our experiments are reproducible and implemented using a common code base1. 1https://github.com/tanguybosser/grap TPP_tmlr |
| Open Datasets | Yes | We use five real-world marked event sequence datasets frequently referenced in the neural MTPP literature: Last FM (Hidasi & Tikk, 2012), MOOC, Reddit (Kumar et al., 2019), Github (Trivedi et al., 2019), and Stack Overflow (Du et al., 2016). We employ the pre-processed version of these datasets as described in (Bosser & Ben Taieb, 2023) which can be openly accessed at this url: https://www.dropbox.com/sh/maq7nju7v5020kp/ AAAFBvzxe Nqy SRs Am-zg U7s3a/processed/data?dl=0&subfolder_nav_tracking=1 (MIT License). |
| Dataset Splits | Yes | Each dataset is randomly partitioned into 3 train/validation/test splits (60%/20%/20%). |
| Hardware Specification | Yes | All models were trained on a machine equipped with an AMD Ryzen Threadripper PRO 3975WX CPU running at 4.1 GHz and a Nvidia RTX A4000 GPU. |
| Software Dependencies | No | Our framework is implemented in a unified codebase using Py Torch7. While PyTorch is mentioned, a specific version number is not provided, which is required for a reproducible software dependency description. |
| Experiment Setup | Yes | For all models, we minimize the average NLL in (17) on the training sequences using mini-batch gradient descent with the Adam optimizer (Kingma & Ba, 2014) and a learning rate of 10-3. For the base models and the base+ setup, an early-stopping protocol interrupts training if the model fails to show improvement in the total validation loss (i.e., LT + LT) for 50 consecutive epochs. In the base++ setup, two distinct early-stopping protocols are implemented for the LT and LM terms, respectively. The optimization process can last for a maximum of 500 epochs...the dimension de of the event encodings is set to 8. Additionally, we chose a value of M = 32 for the number of mixture components. ... we set the number of GCIF projections to C = 32. |