PerSEval: Assessing Personalization in Text Summarizers
Authors: Sourish Dasgupta, Ankush Chander, Tanmoy Chakraborty, Parth Borad, Isha Motiyani
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Based on the benchmarking of ten SOTA summarization models on the PENS dataset, we empirically establish that (i) Per SEval is reliable w.r.t human-judgment correlation (Pearson s r = 0.73; Spearman s ρ = 0.62; Kendall s τ = 0.42), (ii) Per SEval has high rank-stability, (iii) Per SEval as a rankmeasure is not entailed by EGISES-based ranking, and (iv) Per SEval can be a standalone rank-measure without the need of any aggregated ranking. |
| Researcher Affiliation | Academia | 1Dhirubhai Ambani Institute of Information & Communication Technology, India 2Indian Institute of Technology Delhi, India Corresponding authors: EMAIL, EMAIL |
| Pseudocode | No | The paper provides mathematical formulations and definitions but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Code: https://github.com/KDM-LAB/Perseval-TMLR |
| Open Datasets | Yes | Microsoft PENS Dataset (News Domain). Our study, as in (Vansh et al., 2023), assesses models using test data from the PENS dataset provided by Microsoft Research (Ao et al., 2021)4. Open AI CNN/Daily Mail Dataset (News Domain). To understand the applicability of Per SEval on mainstream gold-standard news datasets, we design an indirect evaluation methodology with the Open AI CNN/DM dataset (validation and test) released by Stiennon et al. (2020). Open AI TL;DR (Reddit) Dataset (Open Domain). To understand the broader applicability of Per SEval, we also appropriated the Open AI TL;DR dataset Stiennon et al. (2020). This dataset is a collection of 123,169 Reddit posts adopted from the dataset by Völske et al. (2017) |
| Dataset Splits | Yes | Our study, as in (Vansh et al., 2023), assesses models using test data from the PENS dataset provided by Microsoft Research (Ao et al., 2021)4. Open AI CNN/Daily Mail Dataset (News Domain). To understand the applicability of Per SEval on mainstream gold-standard news datasets, we design an indirect evaluation methodology with the Open AI CNN/DM dataset (validation and test) released by Stiennon et al. (2020). A subset of the validation dataset comprises 1038 posts that were fed into 13 policies to generate 7713 summaries. |
| Hardware Specification | Yes | System specifications: Machine architecture: x86_64; CPU: Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz; CPU Cores: 16; Thread(s) per core: 2. |
| Software Dependencies | No | The paper mentions software components such as ROUGE, BLEU, METEOR, Bert Score, Jensen-Shannon Distance, and Info LM, and also BERT Base (uncased; 110M params) as a pre-trained Masked Language Model. However, it does not provide specific version numbers for any of these software libraries or tools. |
| Experiment Setup | Yes | Per SEval hyper-parameters: α = 3, β = 1.7 (optimal β; see 3), and γ = 4. An 11-point hyper-parameter ablation study shows that the optimal correlation is at β = 1.7 |