Concept-Centric Token Interpretation for Vector-Quantized Generative Models
Authors: Tianze Yang, Yucheng Shi, Mengnan Du, Xuansheng Wu, Qiaoyu Tan, Jin Sun, Ninghao Liu
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate CORTEX s efficacy in providing clear explanations of token usage in the generative process, outperforming baselines across multiple pretrained VQGMs. Our experiments validate the effectiveness of our framework in enhancing VQGMs interpretability and enabling applications such as precise image editing and bias identification. |
| Researcher Affiliation | Academia | 1School of Computing, University of Georgia 2Department of Data Science, New Jersey Institute of Technology 3Department of Computer Science, New York University. Correspondence to: Ninghao Liu <EMAIL>. |
| Pseudocode | No | The paper describes the methodology using textual explanations, mathematical formulations, and diagrams. There are no clearly labeled pseudocode or algorithm blocks present in the document. |
| Open Source Code | Yes | Our code is available at https://github.com/ Yang Tianze009/CORTEX. |
| Open Datasets | Yes | To elucidate the selected T from our proposed explanation method ϕ of VQGMs, we use a synthetic data set generated by VQGAN (Esser et al., 2021), which encompasses the same categories as Image Net (synthetics dataset details in Appendix A.2). The images are evenly distributed across all Image Net categories, resulting in 1,000, 300, and 50 images per category in the training, validation, and test sets, respectively. |
| Dataset Splits | Yes | The dataset consists of: 1,000,000 training images 300,000 validation images 50,000 test images The images are evenly distributed across all Image Net categories, resulting in 1,000, 300, and 50 images per category in the training, validation, and test sets, respectively. |
| Hardware Specification | No | The paper describes the experimental setup and training settings but does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions several techniques and optimizers used, such as Adam optimizer, Step LR scheduler, Cross Entropy Loss, Adam W optimizer, automatic mixed precision (AMP), and Gumbel-Softmax. However, it does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used. |
| Experiment Setup | Yes | Training setting for CE and RE. These information extractors were trained using a batch size of 256 for 80 epochs, with the task involving classification across 1000 classes. We employed the Adam optimizer with an initial learning rate of 0.001 and weight decay of 1e 4. To adjust the learning rate during training, we implemented a Step LR scheduler, which decreased the learning rate by a factor of 0.1 every 20 epochs. The loss function used for training was Cross Entropy Loss. Training setting for TE. Due to the distinct characteristics of transformer-based architectures, we adopted a specialized training strategy different from CNN and Res Net approaches. Specifically, we employed Adam W optimizer with weight decay 1e-4, combined with a hybrid learning rate schedule consisting of a linear warmup phase (10% of total iterations) followed by cosine annealing. The initial learning rate was set to 1e-3. For training stability and efficiency, we implemented mixed-precision training using automatic mixed precision (AMP) with gradient scaling. The model was trained for 80 epochs with a batch size of 256. |