Refine Knowledge of Large Language Models via Adaptive Contrastive Learning
Authors: Yinghui Li, Haojing Huang, Jiayi Kuang, Yangning Li, Shu-Yu Guo, Chao Qu, Xiaoyu Tan, Hai-Tao Zheng, Ying Shen, Philip Yu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments and detailed analyses on widely used datasets demonstrate the effectiveness of our method. We conduct experiments and analyses on various advanced LLMs and test them on both in-distribution and out-of-distribution data. The experimental results show that our approach achieves the highest Truthful rate, verifying the effectiveness of our proposed Adaptive Contrastive Learning strategy. Section 4 is dedicated to "EXPERIMENT". |
| Researcher Affiliation | Collaboration | The authors are affiliated with: 1Tsinghua University (Academic), 2Sun-Yat Sen University (Academic), 3INFLY TECH (Shanghai) Co., Ltd. (Industry), 4Peng Cheng Laboratory (Academic/Public Research), 5University of Illinois Chicago (Academic). The presence of both academic institutions and an industry company (INFLY TECH) indicates a collaboration. |
| Pseudocode | No | The paper describes its methodology in Section 3, including mathematical formulations for loss functions (Equations 1-7) and a detailed explanation of its strategy. However, it does not contain a distinct, structured pseudocode block or algorithm box. |
| Open Source Code | No | The paper does not explicitly provide an unambiguous statement of code release or a link to a source code repository. |
| Open Datasets | Yes | The paper uses and cites several publicly available datasets: Trivia QA (Joshi et al., 2017), Natural Questions (Kwiatkowski et al., 2019), and ALCUNA (Yin et al., 2023a). |
| Dataset Splits | Yes | For Trivia QA, the paper states: "we use 90% of the training set to construct a training set for comparative learning data and 10% as a validation set. Since there is no standard answer in Trivia QA s test set, we select 11,313 Q&A pairs from the development set to build our final test set." For Natural Questions, it mentions: "The development set containing 3,610 instances is used to build our test set." For ALCUNA, it states: "We randomly sampled 1000 instances from the ALCUNA dataset to serve as our out-of-domain test set." |
| Hardware Specification | Yes | All experiments are conducted on Nvidia A100 80GB GPUs. |
| Software Dependencies | No | The paper mentions using specific base models (LLa MA-2-7B-chat, Mistral-7B-Instruct-v0.1) and the 'vllm framework', but it does not specify version numbers for any of these software components or other libraries used for implementation. |
| Experiment Setup | Yes | The paper provides specific experimental setup details: "During the training of the LLa MA model, we used a batch size of 16, a learning rate of 5e-5, a context length of 1024, and trained for 2 epochs. For the Mistral model, we used a batch size of 16, a learning rate of 1e-5, a context length of 1024, and also trained for 2 epochs. The τ is set to 0.01 and the λ is set to 1." |