Tuning-Free Accountable Intervention for LLM Deployment – a Metacognitive Approach

Authors: Zhen Tan, Jie Peng, Song Wang, Lijie Hu, Tianlong Chen, Huan Liu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on real-world datasets with LLM backbones in various sizes and architectures, and the results demonstrate that our intervention consistently improves inference-time predictions.
Researcher Affiliation Academia Zhen Tan1, Jie Peng2, Song Wang3, Lijie Hu4, Tianlong Chen5 Huan Liu1, 1Arizona State University 2University of Science and Technology of China 3University of Virginia 4King Abdullah University of Science and Technology 5University of North Carolina at Chapel Hill EMAIL, EMAIL, EMAIL, EMAIL, EMAIL,
Pseudocode No The paper describes the methodology in Section 3 and illustrates it with Figure 2 and Figure 3, but does not include a dedicated pseudocode or algorithm block.
Open Source Code Yes Code https://github.com/Zhen-Tan-dmml/CLEAR.git.
Open Datasets Yes Our experiments are conducted on three datasets, including two widely-used real-world datasets, CEBa B (Abraham et al. 2022) and IMDB-C (Tan et al. 2023b) and a self-curated dataset ASAP-C.
Dataset Splits Yes Table 1: Statistics of experimented datasets and concepts. Dataset CEBa B (5-way classification) Train / Dev / Test 1755 / 1673 / 1685 IMDB-C (2-way classification) Train / Dev / Test 100 / 50 / 50 ASAP-C (regression) Train / Dev / Test 1005 / 281 / 283
Hardware Specification No The paper does not provide specific hardware details used for running its experiments. It mentions training models but gives no information on GPUs, CPUs, or other computing resources.
Software Dependencies No The paper mentions LLM backbones like BERT (Devlin et al. 2018), OPT (Zhang et al. 2022), and T5 (Raffel et al. 2020) with citations, but it does not provide specific version numbers for these or any other software dependencies, libraries, or programming languages used.
Experiment Setup No The paper mentions "We adopt an early stopping strategy, as per Abraham et al. (2022), to mitigate overfitting, with further details provided in Appendix B and G." but does not provide specific hyperparameters like learning rate, batch size, or optimizer settings in the main text.