Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaptation

Authors: Anqi Li, Feng Li, Yuxi Liu, Runmin Cong, Yao Zhao, Huihui Bai

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our comprehensive experimental results demonstrate the outstanding adaptation capability of Control-GIC, which achieves superior performance from perceptual quality, flexibility, and compression efficiency over three types of recent state-of-the-art methods including generative, progressive, and variable-rate compression methods using only a single unified model. We evaluate our method on the Kodak (Kodak, 1993), DIV2K (Agustsson & Timofte, 2017), and CLIC2020 (Toderici et al., 2020) datasets.
Researcher Affiliation Academia Anqi Li 1,2 Feng Li 3, Yuxi Liu 1,2 Runmin Cong 4 Yao Zhao 1,2 Huihui Bai 1,2, 1 Institute of Information Science, Beijing Jiaotong University 2 Beijing Key Laboratory of Advanced Information Science and Network Technology 3 School of Computer Science and Engineering, Hefei University of Technology 4 School of Control Science and Engineering, Shandong University EMAIL EMAIL, EMAIL
Pseudocode No The paper describes the methodology using natural language and mathematical formulations (e.g., Equation 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement about the release of source code or a link to a code repository.
Open Datasets Yes We randomly select 300K images from the Open Images (Krasin et al., 2017) dataset as our training set, where the images are randomly cropped to a uniform 256 256 resolution. We evaluate our method on the Kodak (Kodak, 1993), DIV2K (Agustsson & Timofte, 2017), and CLIC2020 (Toderici et al., 2020) datasets.
Dataset Splits No The paper states: 'We randomly select 300K images from the Open Images (Krasin et al., 2017) dataset as our training set, where the images are randomly cropped to a uniform 256 256 resolution.' and 'We evaluate our method on the Kodak (Kodak, 1993), DIV2K (Agustsson & Timofte, 2017), and CLIC2020 (Toderici et al., 2020) datasets.' While it mentions which datasets are used for training and evaluation and their sizes, it does not provide specific training/validation/test splits for reproducibility, nor does it refer to standard splits for these datasets for the model's training and evaluation.
Hardware Specification Yes We train the model for 0.6M iterations with the learning rate of 5 10 5 on NVIDIA RTX 3090 GPUs. Throughout the training, we maintain the ratio setting of (50%, 40%, 10%) for the fine, medium, and coarse granularity, respectively.
Software Dependencies No Our method is based on Mo VQ (Zheng et al., 2022) which improves the VQGAN model by adding spatial variants to representations within the decoder, avoiding the repeat artifacts in neighboring patches. We leverage the pre-trained codebook in Mo VQ and carefully redesign the architecture.
Experiment Setup Yes We train the model for 0.6M iterations with the learning rate of 5 10 5 on NVIDIA RTX 3090 GPUs. Throughout the training, we maintain the ratio setting of (50%, 40%, 10%) for the fine, medium, and coarse granularity, respectively. Within our model, we take three representation granularities: 4 4, 8 8, and 16 16. The codebook C Rk d comprises k = 1024 code vectors, each with a dimension of d = 4.