Refine Knowledge of Large Language Models via Adaptive Contrastive Learning

Authors: Yinghui Li, Haojing Huang, Jiayi Kuang, Yangning Li, Shu-Yu Guo, Chao Qu, Xiaoyu Tan, Hai-Tao Zheng, Ying Shen, Philip Yu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments and detailed analyses on widely used datasets demonstrate the effectiveness of our method. We conduct experiments and analyses on various advanced LLMs and test them on both in-distribution and out-of-distribution data. The experimental results show that our approach achieves the highest Truthful rate, verifying the effectiveness of our proposed Adaptive Contrastive Learning strategy. Section 4 is dedicated to "EXPERIMENT".
Researcher Affiliation Collaboration The authors are affiliated with: 1Tsinghua University (Academic), 2Sun-Yat Sen University (Academic), 3INFLY TECH (Shanghai) Co., Ltd. (Industry), 4Peng Cheng Laboratory (Academic/Public Research), 5University of Illinois Chicago (Academic). The presence of both academic institutions and an industry company (INFLY TECH) indicates a collaboration.
Pseudocode No The paper describes its methodology in Section 3, including mathematical formulations for loss functions (Equations 1-7) and a detailed explanation of its strategy. However, it does not contain a distinct, structured pseudocode block or algorithm box.
Open Source Code No The paper does not explicitly provide an unambiguous statement of code release or a link to a source code repository.
Open Datasets Yes The paper uses and cites several publicly available datasets: Trivia QA (Joshi et al., 2017), Natural Questions (Kwiatkowski et al., 2019), and ALCUNA (Yin et al., 2023a).
Dataset Splits Yes For Trivia QA, the paper states: "we use 90% of the training set to construct a training set for comparative learning data and 10% as a validation set. Since there is no standard answer in Trivia QA s test set, we select 11,313 Q&A pairs from the development set to build our final test set." For Natural Questions, it mentions: "The development set containing 3,610 instances is used to build our test set." For ALCUNA, it states: "We randomly sampled 1000 instances from the ALCUNA dataset to serve as our out-of-domain test set."
Hardware Specification Yes All experiments are conducted on Nvidia A100 80GB GPUs.
Software Dependencies No The paper mentions using specific base models (LLa MA-2-7B-chat, Mistral-7B-Instruct-v0.1) and the 'vllm framework', but it does not specify version numbers for any of these software components or other libraries used for implementation.
Experiment Setup Yes The paper provides specific experimental setup details: "During the training of the LLa MA model, we used a batch size of 16, a learning rate of 5e-5, a context length of 1024, and trained for 2 epochs. For the Mistral model, we used a batch size of 16, a learning rate of 1e-5, a context length of 1024, and also trained for 2 epochs. The τ is set to 0.01 and the λ is set to 1."