Projection Head is Secretly an Information Bottleneck
Authors: Zhuo Ouyang, Kaiwen Hu, Qi Zhang, Yifei Wang, Yisen Wang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, our methods exhibit consistent improvement in the downstream performance across various real-world datasets, including CIFAR-10, CIFAR-100, and Image Net-100. We believe our theoretical understanding on the role of the projection head will inspire more principled and advanced designs in this field. Code is available at https://github.com/PKU-ML/Projector_Theory. |
| Researcher Affiliation | Academia | 1 College of Engineering, Peking University 2 School of EECS, Peking University 3 State Key Lab of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University 4 MIT CSAIL 5 Institute for Artificial Intelligence, Peking University |
| Pseudocode | No | The paper describes methods and theoretical analyses using mathematical formulations, but it does not contain any clearly labeled pseudocode or algorithm blocks with structured steps. |
| Open Source Code | Yes | Code is available at https://github.com/PKU-ML/Projector_Theory. |
| Open Datasets | Yes | Empirically, our methods exhibit consistent improvement in the downstream performance across various real-world datasets, including CIFAR-10, CIFAR-100, and Image Net-100. |
| Dataset Splits | Yes | Empirically, our methods exhibit consistent improvement in the downstream performance across various real-world datasets, including CIFAR-10, CIFAR-100, and Image Net-100. ... We conduct our experiments on CIFAR-10, CIFAR-100, and Image Net-100, with Res Net-18 as our backbone. |
| Hardware Specification | Yes | All experiments are conducted with at most two NVIDIA RTX 3090 GPUs. |
| Software Dependencies | No | The paper does not explicitly mention specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | On CIFAR-10 and CIFAR-100, we use learning rate 0.4, weight decay 10 4, Info NCE temperature 0.2, and set λ to 10 4. Our projector adopts a Linear-Re LU-Linear structure, where we use 2048 as the hidden dimension and 256 as the output dimension. On Image Net-100, we use learning rate 0.3, weight decay 10 4, Info NCE temperature 0.2 , and set λ to 0.01. We use the same projector structure but change the hidden dimension to 4096 and the output dimension to 512. |