Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs
Authors: Barrett Tang, Zile Huang, Chengzhi Liu, Qiang Sun, Harry Yang, Ser-Nam Lim
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments reveal a correlation between the eigenspectrum and hallucinations across various MLLMs and show that TAME reduces the percentage of hallucinated objects. Code released at https://github.com/Everlyn-Labs/ANTRP. |
| Researcher Affiliation | Collaboration | Feilong Tang1,2 , Zile Huang1,2 , Chengzhi Liu4, Qiang Sun2,3, Harry Yang1,2, Ser-Nam Lim2,5 1HKUST, 2Everlyn AI, 3University of Toronto, 4University of Liverpool, 5UCF EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Pseudo-code of TAME in a Py Torch-like style. |
| Open Source Code | Yes | Code released at https://github.com/Everlyn-Labs/ANTRP. |
| Open Datasets | Yes | We perform the CHAIR evaluation on the MSCOCO dataset (Lin et al., 2014)...Hallu Bench (Zhao et al., 2023) represents a more advanced benchmark, utilizing detailed object-level descriptions from the VG dataset (Krishna et al., 2017)...SEED-Bench (Li et al., 2023a)...GQA (Hudson & Manning, 2019)...Vizwiz (Gurari et al., 2018)...MME (Fu et al., 2023)...MMBench (Liu et al., 2025b)...POPE (Li et al., 2023c)...Wikitext-103 (Merity et al., 2016) and Mini Pile (Kaddour, 2023) datasets |
| Dataset Splits | Yes | Following the Baseline method, we randomly select 500 images from the validation set of COCO 2014 and prompt various MLLMs)...The evaluation is conducted across three distinct splits: the random split, where objects are randomly selected from the entire dataset; the popular split, which evaluates the recognition of frequently occurring objects; and the adversarial split, which assesses the ability of model to detect objects closely related to those present in the image. |
| Hardware Specification | Yes | Experiments are performed on NVIDIA H20/H100 GPUs. |
| Software Dependencies | No | Algorithm 1 Pseudo-code of TAME in a Py Torch-like style. The paper mentions PyTorch but does not specify a version number for it or any other key software components. |
| Experiment Setup | Yes | Basically, the hyperparameter gamma of TAME is set to the default value of 1. Other parameters use the default settings, same as the Baseline...To ensure a fair evaluation, we impose two different maximum token limits, as the length of generated sequences can significantly affect CHAIR scores (CS and CI). |