reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Concept-Centric Token Interpretation for Vector-Quantized Generative Models

Authors: Tianze Yang, Yucheng Shi, Mengnan Du, Xuansheng Wu, Qiaoyu Tan, Jin Sun, Ninghao Liu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate CORTEX s efficacy in providing clear explanations of token usage in the generative process, outperforming baselines across multiple pretrained VQGMs. Our experiments validate the effectiveness of our framework in enhancing VQGMs interpretability and enabling applications such as precise image editing and bias identification.
Researcher Affiliation	Academia	1School of Computing, University of Georgia 2Department of Data Science, New Jersey Institute of Technology 3Department of Computer Science, New York University. Correspondence to: Ninghao Liu <EMAIL>.
Pseudocode	No	The paper describes the methodology using textual explanations, mathematical formulations, and diagrams. There are no clearly labeled pseudocode or algorithm blocks present in the document.
Open Source Code	Yes	Our code is available at https://github.com/ Yang Tianze009/CORTEX.
Open Datasets	Yes	To elucidate the selected T from our proposed explanation method ϕ of VQGMs, we use a synthetic data set generated by VQGAN (Esser et al., 2021), which encompasses the same categories as Image Net (synthetics dataset details in Appendix A.2). The images are evenly distributed across all Image Net categories, resulting in 1,000, 300, and 50 images per category in the training, validation, and test sets, respectively.
Dataset Splits	Yes	The dataset consists of: 1,000,000 training images 300,000 validation images 50,000 test images The images are evenly distributed across all Image Net categories, resulting in 1,000, 300, and 50 images per category in the training, validation, and test sets, respectively.
Hardware Specification	No	The paper describes the experimental setup and training settings but does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions several techniques and optimizers used, such as Adam optimizer, Step LR scheduler, Cross Entropy Loss, Adam W optimizer, automatic mixed precision (AMP), and Gumbel-Softmax. However, it does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup	Yes	Training setting for CE and RE. These information extractors were trained using a batch size of 256 for 80 epochs, with the task involving classification across 1000 classes. We employed the Adam optimizer with an initial learning rate of 0.001 and weight decay of 1e 4. To adjust the learning rate during training, we implemented a Step LR scheduler, which decreased the learning rate by a factor of 0.1 every 20 epochs. The loss function used for training was Cross Entropy Loss. Training setting for TE. Due to the distinct characteristics of transformer-based architectures, we adopted a specialized training strategy different from CNN and Res Net approaches. Specifically, we employed Adam W optimizer with weight decay 1e-4, combined with a hybrid learning rate schedule consisting of a linear warmup phase (10% of total iterations) followed by cosine annealing. The initial learning rate was set to 1e-3. For training stability and efficiency, we implemented mixed-precision training using automatic mixed precision (AMP) with gradient scaling. The model was trained for 80 epochs with a batch size of 256.