Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaptation
Authors: Anqi Li, Feng Li, Yuxi Liu, Runmin Cong, Yao Zhao, Huihui Bai
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our comprehensive experimental results demonstrate the outstanding adaptation capability of Control-GIC, which achieves superior performance from perceptual quality, flexibility, and compression efficiency over three types of recent state-of-the-art methods including generative, progressive, and variable-rate compression methods using only a single unified model. We evaluate our method on the Kodak (Kodak, 1993), DIV2K (Agustsson & Timofte, 2017), and CLIC2020 (Toderici et al., 2020) datasets. |
| Researcher Affiliation | Academia | Anqi Li 1,2 Feng Li 3, Yuxi Liu 1,2 Runmin Cong 4 Yao Zhao 1,2 Huihui Bai 1,2, 1 Institute of Information Science, Beijing Jiaotong University 2 Beijing Key Laboratory of Advanced Information Science and Network Technology 3 School of Computer Science and Engineering, Hefei University of Technology 4 School of Control Science and Engineering, Shandong University EMAIL EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology using natural language and mathematical formulations (e.g., Equation 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about the release of source code or a link to a code repository. |
| Open Datasets | Yes | We randomly select 300K images from the Open Images (Krasin et al., 2017) dataset as our training set, where the images are randomly cropped to a uniform 256 256 resolution. We evaluate our method on the Kodak (Kodak, 1993), DIV2K (Agustsson & Timofte, 2017), and CLIC2020 (Toderici et al., 2020) datasets. |
| Dataset Splits | No | The paper states: 'We randomly select 300K images from the Open Images (Krasin et al., 2017) dataset as our training set, where the images are randomly cropped to a uniform 256 256 resolution.' and 'We evaluate our method on the Kodak (Kodak, 1993), DIV2K (Agustsson & Timofte, 2017), and CLIC2020 (Toderici et al., 2020) datasets.' While it mentions which datasets are used for training and evaluation and their sizes, it does not provide specific training/validation/test splits for reproducibility, nor does it refer to standard splits for these datasets for the model's training and evaluation. |
| Hardware Specification | Yes | We train the model for 0.6M iterations with the learning rate of 5 10 5 on NVIDIA RTX 3090 GPUs. Throughout the training, we maintain the ratio setting of (50%, 40%, 10%) for the fine, medium, and coarse granularity, respectively. |
| Software Dependencies | No | Our method is based on Mo VQ (Zheng et al., 2022) which improves the VQGAN model by adding spatial variants to representations within the decoder, avoiding the repeat artifacts in neighboring patches. We leverage the pre-trained codebook in Mo VQ and carefully redesign the architecture. |
| Experiment Setup | Yes | We train the model for 0.6M iterations with the learning rate of 5 10 5 on NVIDIA RTX 3090 GPUs. Throughout the training, we maintain the ratio setting of (50%, 40%, 10%) for the fine, medium, and coarse granularity, respectively. Within our model, we take three representation granularities: 4 4, 8 8, and 16 16. The codebook C Rk d comprises k = 1024 code vectors, each with a dimension of d = 4. |