Tuning-Free Accountable Intervention for LLM Deployment – a Metacognitive Approach
Authors: Zhen Tan, Jie Peng, Song Wang, Lijie Hu, Tianlong Chen, Huan Liu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on real-world datasets with LLM backbones in various sizes and architectures, and the results demonstrate that our intervention consistently improves inference-time predictions. |
| Researcher Affiliation | Academia | Zhen Tan1, Jie Peng2, Song Wang3, Lijie Hu4, Tianlong Chen5 Huan Liu1, 1Arizona State University 2University of Science and Technology of China 3University of Virginia 4King Abdullah University of Science and Technology 5University of North Carolina at Chapel Hill EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, |
| Pseudocode | No | The paper describes the methodology in Section 3 and illustrates it with Figure 2 and Figure 3, but does not include a dedicated pseudocode or algorithm block. |
| Open Source Code | Yes | Code https://github.com/Zhen-Tan-dmml/CLEAR.git. |
| Open Datasets | Yes | Our experiments are conducted on three datasets, including two widely-used real-world datasets, CEBa B (Abraham et al. 2022) and IMDB-C (Tan et al. 2023b) and a self-curated dataset ASAP-C. |
| Dataset Splits | Yes | Table 1: Statistics of experimented datasets and concepts. Dataset CEBa B (5-way classification) Train / Dev / Test 1755 / 1673 / 1685 IMDB-C (2-way classification) Train / Dev / Test 100 / 50 / 50 ASAP-C (regression) Train / Dev / Test 1005 / 281 / 283 |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. It mentions training models but gives no information on GPUs, CPUs, or other computing resources. |
| Software Dependencies | No | The paper mentions LLM backbones like BERT (Devlin et al. 2018), OPT (Zhang et al. 2022), and T5 (Raffel et al. 2020) with citations, but it does not provide specific version numbers for these or any other software dependencies, libraries, or programming languages used. |
| Experiment Setup | No | The paper mentions "We adopt an early stopping strategy, as per Abraham et al. (2022), to mitigate overfitting, with further details provided in Appendix B and G." but does not provide specific hyperparameters like learning rate, batch size, or optimizer settings in the main text. |