On Disentangled Training for Nonlinear Transform in Learned Image Compression
Authors: Han Li, Shaohui Li, Wenrui Dai, Maida Cao, Nuowen Kan, Chenglin Li, Junni Zou, Hongkai Xiong
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that the proposed approach can accelerate training of LIC models by 2 times and simultaneously achieves an average 1% BD-rate reduction. To our best knowledge, this is one of the first successful attempt that can significantly improve the convergence of LIC with comparable or superior rate-distortion performance. We perform ablation studies to further evaluate the effectiveness of our proposed Aux T. |
| Researcher Affiliation | Academia | 1Shanghai Jiao Tong University, 2Tsinghua Shenzhen International Graduate School, Tsinghua University EMAIL, EMAIL |
| Pseudocode | No | The paper describes the method using prose, equations, and diagrams (Figure 5) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | Code will be released at https://github.com/qingshi9974/Aux T |
| Open Datasets | Yes | All the models are trained on the Image Net-1k (Deng et al., 2009) dataset and optimized using Adam optimizer (Kingma & Ba, 2015). We adopt three benchmark datasets, i.e., Kodak image set (Kodak, 1993) with 24 images of 768 512 pixels, Tecnick testset (Asuni & Giachetti, 2014) with 100 images of 1200 1200 pixels, and CLIC Professional Validation dataset (CLIC, 2021) with 41 images of at most 2K resolution, for evaluations. |
| Dataset Splits | No | The paper mentions using ImageNet-1k for training and Kodak, Tecnick testset, and CLIC Professional Validation dataset for evaluation. While the latter are standard test sets, the paper does not specify the training/validation split for ImageNet-1k or other training data needed to reproduce the experimental setup. |
| Hardware Specification | Yes | Experiments are performed on NVIDIA Ge Force RTX 4090 GPU and Intel Xeon Platinum 8260 CPU. |
| Software Dependencies | No | The paper mentions using the Adam optimizer, but does not provide specific version numbers for any software libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | We set the batch size to 16 for convolution-based LIC models (Minnen et al., 2018; He et al., 2022) and 8 for transformer-based LIC models Zou et al. (2022); Liu et al. (2023). We train the models without our Aux T for 0.6M and 2M iterations respectively, and train the models with our Aux T for 0.6M and 1M iterations, respectively. The learning rate is initialized as 10 4 and is decayed by a factor of 10 after 0.55M iterations for 0.6M training iterations scenario, after 0.9M iterations for 1M training iterations scenario, and after 1.8M iterations for 2M training iterations scenario. The Lagrangian multiplier λ in the R-D loss used for training MSE-optimized models are {0.0025, 0.0035, 0.0067, 0.0130, 0.0250, 0.0483}, and those for MS-SSIM-optimized models are {2.40, 4.58, 8.73, 16.64, 31.73, 60.50}. The orthogonal regularization weight λorth is 0.1. |