ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains
Authors: Yein Park, Chanwoong Yoon, Jungwoo Park, Donghyeon Lee, Minbyul Jeong, Jaewoo Kang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To overcome this, we introduce CHROKNOWBENCH, a benchmark dataset designed to evaluate chronologically accumulated knowledge across three key aspects: multiple domains, time dependency, temporal state. Our evaluation led to the following observations: (1) The ability of eliciting temporal knowledge varies depending on the data format that model was trained on. |
| Researcher Affiliation | Collaboration | Yein Park1, Chanwoong Yoon1, Jungwoo Park1,3, Donghyeon Lee1,3, Minbyul Jeong2 , Jaewoo Kang1,3 Korea University1 Upstage AI2 AIGEN Sciences3 |
| Pseudocode | Yes | Algorithm 1: Iterative Distractor Generation Algorithm Algorithm 2: Chronological Prompting Algorithm |
| Open Source Code | Yes | Our datasets and code are publicly available at https://github.com/dmis-lab/ChroKnowledge |
| Open Datasets | Yes | Our datasets and code are publicly available at https://github.com/dmis-lab/ChroKnowledge |
| Dataset Splits | Yes | The test set consists of 10% of the total dataset from each domain. |
| Hardware Specification | Yes | The precision is done with eight NVIDIA A100 GPUs(80GB). |
| Software Dependencies | No | We utilize the rapidfuzz library to compare the model s responses with the predefined labels. ... We utilize the spaCy en_core_web_lg model to detect named entities in the paragraphs... |
| Experiment Setup | Yes | We use a temperature set T 0, 0.7 to capture variations in prediction, where T includes both greedy decoding and temperature sampling. We set n as 5, meaning that we evaluate using five distinct combinations of few-shot exemplars to ensure the robust assessment. |