reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ExPERT: Modeling Human Behavior Under External Stimuli Aware Personalized MTPP

Authors: Subhendu Khatuya, Ritvik Vij, Paramita Koley, Samik Datta, Niloy Ganguly

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Towards evaluating the efficacy, we put together a comprehensive benchmark comprising 5 datasets (2 novel additions, and 3 repurposed from existing open datasets) harvested from several domains, spanning education, e-commerce, online payment, and discussion forum. On average, we achieve 9.35% gain in type-prediction accuracy and 7.38% reduction in time-prediction RMSE across all datasets over SOTA MTPP baselines. We demonstrate the superior performance of our proposed model through extensive ablations and showcasing its ability to capture complex combinations of external stimuli in a synthetic set up.
Researcher Affiliation	Collaboration	Subhendu Khatuya1, Ritvik Vij2, Paramita Koley3, Samik Datta1, Niloy Ganguly1 1 IIT Kharagpur, India 2 Amazon 3 Machine Intelligence Unit, Indian Statistical Institute Kolkata EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the model architecture and components (Encoding, Personalization, External Stimuli Aware Attention, Causal Mask, Final Hidden Representation, Learning objective) in detail through text and mathematical formulations, but it does not contain a clearly labeled pseudocode block or algorithm.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository.
Open Datasets	Yes	For the purpose of evaluation, we assemble a comprehensive array of 5 datasets, harvested from a wide variety of domains spanning online payment, online education (MOOC, as well as institute-wide online classes), and an online discussion forum on software engineering. Two of these datasets are carefully curated by us, whereas the other three have been repurposed (and reassembled) from existing open-domain datasets in order to feature both human action and external stimuli. MOOC-5M (repurposed from (Feng et al. 2019)). SO-10M (reassembled from (so2 2021), (Paranjape, Benson, and Leskovec 2017)) DUNNHUMBY (Gonen 2020)
Dataset Splits	Yes	Next, we split our dataset into two random equally-sized sets of sequences A and B and construct the train and test data as follows Train : (Full sequences in A)+(first 70% events of all sequences in B), Test : (last 30% events of all sequences in B).
Hardware Specification	No	The paper describes the training process parameters like optimizer, learning rate, and batch size, but does not mention any specific hardware (GPU, CPU, memory, etc.) used for running the experiments.
Software Dependencies	No	The paper mentions using Adam optimizer, Sentence-BERT, Transformer Hawkes Process, and transformers, but it does not specify version numbers for these or any other software libraries or programming languages.
Experiment Setup	Yes	For training, we employ Adam optimizer with learning rate 1e 3 and batch size 64 for 30 epochs, where we select the model with least training error. Layer normalization and dropout of 0.1 are employed at the multihead attention and feed-forward layer. Number of attention heads and attention layers are set to 4. The feed-forward network consists of MH = 1024 hidden nodes with Ge LU activation. We set softness parameter βk = 1 and αk = 0.1, k. Moreover we set the loss weights (λl, λk, λt) = (1, 1, 0) while optimizing for type prediction and (λl, λk, λt) = (1, 0, 1) while optimizing for time prediction, inspired from (Park et al. 2022).