ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains

Authors: Yein Park, Chanwoong Yoon, Jungwoo Park, Donghyeon Lee, Minbyul Jeong, Jaewoo Kang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To overcome this, we introduce CHROKNOWBENCH, a benchmark dataset designed to evaluate chronologically accumulated knowledge across three key aspects: multiple domains, time dependency, temporal state. Our evaluation led to the following observations: (1) The ability of eliciting temporal knowledge varies depending on the data format that model was trained on.
Researcher Affiliation Collaboration Yein Park1, Chanwoong Yoon1, Jungwoo Park1,3, Donghyeon Lee1,3, Minbyul Jeong2 , Jaewoo Kang1,3 Korea University1 Upstage AI2 AIGEN Sciences3
Pseudocode Yes Algorithm 1: Iterative Distractor Generation Algorithm Algorithm 2: Chronological Prompting Algorithm
Open Source Code Yes Our datasets and code are publicly available at https://github.com/dmis-lab/ChroKnowledge
Open Datasets Yes Our datasets and code are publicly available at https://github.com/dmis-lab/ChroKnowledge
Dataset Splits Yes The test set consists of 10% of the total dataset from each domain.
Hardware Specification Yes The precision is done with eight NVIDIA A100 GPUs(80GB).
Software Dependencies No We utilize the rapidfuzz library to compare the model s responses with the predefined labels. ... We utilize the spaCy en_core_web_lg model to detect named entities in the paragraphs...
Experiment Setup Yes We use a temperature set T 0, 0.7 to capture variations in prediction, where T includes both greedy decoding and temperature sampling. We set n as 5, meaning that we evaluate using five distinct combinations of few-shot exemplars to ensure the robust assessment.