Multilingual LLMs Inherently Reward In-Language Time-Sensitive Semantic Alignment for Low-Resource Languages
Authors: Ashutosh Bajpai, Tanmoy Chakraborty
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical evidence underscores the superior performance of CLi TSSA compared to established baselines across three languages Romanian, German, and French, encompassing three temporal tasks and including a diverse set of four contemporaneous LLMs. This marks a significant step forward in addressing resource disparity in the context of temporal reasoning across languages. |
| Researcher Affiliation | Collaboration | Ashutosh Bajpai1,2, Tanmoy Chakraborty1 1 Indian Institute of Technology Delhi, India 2 Wipro Research, India EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods and procedures in narrative text, without presenting any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Source code and dataset are available at https://github.com/abiitd/clitssa. |
| Open Datasets | Yes | 1Source code and dataset are available at https://github.com/abiitd/clitssa. |
| Dataset Splits | Yes | Table 2: Dataset statistics for m TEMPREASON. Train Dev Test Time Range 1014-2022 634-2023 998-2023 L1 400,000 4,000 4,000 L2 16,017 5,521 5,397 L3 13,014 4,437 4,426 |
| Hardware Specification | No | The paper mentions various LLMs used (LLa MA3-8B, Mistral-v1, Vicuna-7b-v1.5, Bloomz-7b1) but does not provide any specific details about the hardware (GPUs, CPUs, memory, etc.) on which these models were run or fine-tuned. |
| Software Dependencies | No | The paper mentions using the T5 model, multilingual Sentence-BERT, and distiluse-base-multilingual-cased-v1 as foundational models, but does not specify their version numbers or other software dependencies with versions. |
| Experiment Setup | Yes | A three-shot ICL approach is used throughout the experimental setting, demonstrating superior outcomes compared to both one-shot and two-shot configurations. The value of h and w is set empirically at 30 and 10, respectively. To fine-tune the CLi TSSA retriever model, the distiluse-base-multilingual-cased-v1 serves as the foundational model. This method is systematically applied to each low-resource language across temporal tasks L1, L2 and L3, to ensure optimum performance. Additionally, an integrated CLi TSSA retriever is fine-tuned across languages and temporal tasks. The Train and Dev datasets from m TEMPREASON are used to construct the parallel corpus to fine-tune the CLi TSSA retriever, with a separate held-out test set employed to benchmark all outcomes. We use word level F1 scores and exact match (EM) standards to quantify the LLM s responses. Please refer to the technical appendix for ablations on few-shots, parameters h & w, along with hyperparameters in detail. |