reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Frozen Language Models Are Gradient Coherence Rectifiers in Vision Transformers

Authors: Lichen Bai, Zixuan Xiong, Hai Lin, Guangwei Xu, Xiangjin Xie, Ruijie Guo, Zhanhui Kang, Hai-Tao Zheng, Hong-Gee Kim

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate the effectiveness of this strategy, making the practical application of the gradient rectification effect feasible. ... Our experiments verify that the frozen LLM block has a certain gradient coherence rectification effect. ... Experiments Datasets Image Net-1K (Russakovsky et al. 2015) also known as ILSVRC 2012 ... Performance Evaluation After incorporating auxiliary training, we compare its performance with that of the vanilla ViT on Image Net and SSv2 dataset in terms of accuracy. ... Ablation Studies In Tab. 6, we focus on the impact of the weight of auxiliary training on overall performance.
Researcher Affiliation	Collaboration	1Shenzhen International Graduate School, Tsinghua University 2Pengcheng Laboratory 3 Alibaba Cloud Computing 4 Machine Learning Platform Department, Tencent 5 Seoul National University
Pseudocode	No	The paper includes figures illustrating concepts and a framework diagram (Figure 6), but no explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for the methodology described, nor does it provide links to any code repositories or mention code in supplementary materials.
Open Datasets	Yes	Datasets Image Net-1K (Russakovsky et al. 2015) ... Something-Something-v2 (Goyal et al. 2017) ... CIFAR-100 (Krizhevsky et al. 2009) ... Caltech-256 (Griffin, Holub, and Perona 2007)
Dataset Splits	Yes	To analyze gradient changes, we choose the Deit architecture (Touvron et al. 2021) and train it on CIFAR100 (Krizhevsky et al. 2009) for 300 epochs. ... We split the dataset into training and testing sets in a 7:3 ratio.
Hardware Specification	Yes	updates on A800 and RTX 3090 device.
Software Dependencies	No	The paper mentions using the AdamW optimizer and cosine annealing scheduling, but does not provide specific version numbers for any software libraries or dependencies.
Experiment Setup	Yes	Implementation Details For the video understanding task, we use Video MAE (Tong et al. 2022) and train it on the SSV2 dataset. we train for 40 epochs with a batch size set to 24 for ViT-S, and 30 epochs with a batch size of 12 for ViT-B. ... For image classification task, we utilize DEiT (Touvron et al. 2021) and train for 300 epochs. For Image Net, we set batch size to 1024. For Cifar-100 and Caltech-256, we set batch size to 256. And for Bar, we set batch size to 64. We use the AdamW optimizer with a learning rate set to 5e-4, weight decay set to 1e-5 and cosine annealing scheduling for updates on A800 and RTX 3090 device.