Accelerating Learned Image Compression Through Modeling Neural Training Dynamics
Authors: Yichi Zhang, Zhihao Duan, Yuning Huang, Fengqing Zhu
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Overall, our proposed method significantly accelerates the training of LICs while reducing the number of trainable parameters and dimensions without compromising performance (Sec. 4.2, Sec. 4.3, Appendix Sec. A.1). 4 Experimental results 4.1 Experimental settings Training. We use the COCO2017 dataset (Lin et al., 2014) for training, which contains 118,287 images, each having around 640 420 pixels. We randomly crop 256 256 patches from these images. All models are trained using the Lagrange multiplier-based rate-distortion loss as defined in Eq. 2. Following the settings of Compress AI (Bégaint et al., 2020) and standard practice (He et al., 2022; Liu et al., 2023; Li et al., 2024a), we set λ to {18, 35, 67, 130, 250, 483} 10 4. For all models in the + SGD series, we train each model using the Adam optimizer1 with β1 = 0.9 and β2 = 0.999. The λ = 0.0018 models are trained for 120 epochs. For models with other λ values, we fine-tune the model trained with λ = 0.0018 for an additional 80 epochs. For all models in the + Proposed series, we train each λ = 0.0018 model using Published in Transactions on Machine Learning Research (04/2025) the Adam optimizer for 70 epochs. For models with other λ values, we fine-tune the model trained with λ = 0.0018 for an additional 50 epochs. More details are provided in the Appendix Sec. A.8. Testing. Three widely-used benchmark datasets, including Kodak2, Tecnick3, and CLIC 20224, are used to evaluate the performance of the proposed method. We further demonstrate the robustness of the proposed method and its capacity for real-world application by conducting experiments on stereo images, remote sensing images, screen content images, and raw image compression, as shown in Appendix Sec. A.1. 4.2 Quantitative results We compare our proposed method with standard SGD-trained models on prevalent complex LICs, including ELIC (He et al., 2022), TCM-S (Liu et al., 2023), and FLIC (Li et al., 2024a) to demonstrate its superior performance. We use SGD-trained models as the anchor to calculate BD-Rate (Bjøntegaard, 2001). |
| Researcher Affiliation | Academia | Yichi Zhang EMAIL Purdue University Zhihao Duan EMAIL Purdue University Yuning Huang EMAIL Purdue University Fengqing Zhu EMAIL Purdue University |
| Pseudocode | Yes | The complete detailed STDET algorithm and CMD calculation are detailed in Appendix Sec. A.7 in Algorithm 1. Algorithm 1 STDET Algorithm |
| Open Source Code | No | The paper lists implementations of other methods for comparison in Table 12 and 13, such as "ELIC (He et al., 2022) https://github.com/Inter Digital Inc/Compress AI" or "P-SGD (Li et al., 2022b) https://github.com/nblt/DLDR". However, the paper does not contain an explicit statement by the authors that they are releasing the source code for the methodology proposed in this paper (STDET and SMA), nor does it provide a direct link to such a repository. |
| Open Datasets | Yes | We use the COCO2017 dataset (Lin et al., 2014) for training, which contains 118,287 images, each having around 640 420 pixels. Testing. Three widely-used benchmark datasets, including Kodak2, Tecnick3, and CLIC 20224, are used to evaluate the performance of the proposed method. Cityscapes: This dataset contains 5,000 outdoor stereo image pairs (2048 1024 resolution), with 2,975 pairs allocated for training and 1,525 for testing. In Stereo2K: This dataset consists of 2,060 indoor stereo image pairs (1080 860 resolution), split into 2,010 pairs for training and 50 for testing. GEset: This dataset consists of 8,064 RGB images... GF1set: The GF1set dataset is derived from the Chinese Gao Fen-1 (GF-1) satellite... GF7set: The GF7set is obtained from the Chinese Gao Fen-7 (GF-7) satellite... PANset: The panchromatic dataset is sourced from China s GF-6 satellite. For screen content image compression, we compare our proposed method with standard SGD-trained models on SFTIP (Zhou et al., 2024). The models are trained on the JPEGAI dataset (ISO/IEC JTC 1/SC29/WG1, 2023), which includes 5,264 images for training and 350 images for validation. For evaluation, we use the SIQAD dataset (Yang et al., 2015), comprising 22 high-resolution, high-quality images, and the SCID dataset (Ni et al., 2017), containing 200 high-resolution, high-quality images. For RAW image compression, we compare our proposed method against standard SGD-trained models on R2LCM (Wang et al., 2024). Training Details. We follow the training configurations specified in (Wang et al., 2024; Nam et al., 2022). The models are trained on the NUS dataset (Cheng et al., 2014), which contains raw images captured by multiple cameras. |
| Dataset Splits | Yes | We use the COCO2017 dataset (Lin et al., 2014) for training, which contains 118,287 images, each having around 640 420 pixels. We randomly crop 256 256 patches from these images. Cityscapes: This dataset contains 5,000 outdoor stereo image pairs (2048 1024 resolution), with 2,975 pairs allocated for training and 1,525 for testing. In Stereo2K: This dataset consists of 2,060 indoor stereo image pairs (1080 860 resolution), split into 2,010 pairs for training and 50 for testing. GEset: ...4,992 images are used for training, and 3,072 are used for testing. GF1set: ...2,400 images are used for training, while the remaining images are reserved for testing. GF7set: ...3,445 images for training and 2,297 images for testing. PANset: ...2,100 images are used for training, and 1,600 are used for testing. The models are trained on the JPEGAI dataset (ISO/IEC JTC 1/SC29/WG1, 2023), which includes 5,264 images for training and 350 images for validation. The dataset is split into training, validation, and test sets following the protocol in (Nam et al., 2022). |
| Hardware Specification | Yes | Table 1: Computational Complexity and BD-Rate Compared to SGD Training Conditions:1 Nvidia 4090 GPU, i9-14900K CPU, 128GB RAM. Table 6: Computational Complexity and BD-Rate Compared to SGD for Stereo Image Training Conditions:1 Nvidia A40 GPU, AMD EPYC 7662 CPU, 1024GB RAM. Table 7: Computational Complexity and BD-Rate Compared to SGD for Remote Sensing Image Training Conditions: 1 Nvidia A40 GPU, AMD EPYC 7662 CPU, 1024GB RAM. Table 8: Computational Complexity and BD-Rate Compared to SGD for Screen Content Image Training Conditions: 1 Nvidia A40 GPU, AMD EPYC 7662 CPU, 1024GB RAM. Table 9: Computational Complexity and BD-Rate Compared to SGD for RAW Image Training Conditions: 1 Nvidia A40 GPU, AMD EPYC 7662 CPU, 1024GB RAM. |
| Software Dependencies | No | The paper mentions software like "Adam optimizer", "Compress AI", "Tensor Flow and Py Torch", and lists implementations of other learned image codecs (e.g., "ELIC (He et al., 2022) https://github.com/Inter Digital Inc/Compress AI") and efficient training methods (e.g., "P-SGD (Li et al., 2022b) https://github.com/nblt/DLDR"). However, it does not provide specific version numbers for the software components (e.g., Python, PyTorch, CUDA) used in their own experimental setup. |
| Experiment Setup | Yes | Following the settings of Compress AI (Bégaint et al., 2020) and standard practice (He et al., 2022; Liu et al., 2023; Li et al., 2024a), we set λ to {18, 35, 67, 130, 250, 483} 10 4. For all models in the + SGD series, we train each model using the Adam optimizer1 with β1 = 0.9 and β2 = 0.999. The λ = 0.0018 models are trained for 120 epochs. For models with other λ values, we fine-tune the model trained with λ = 0.0018 for an additional 80 epochs. For all models in the + Proposed series, we train each λ = 0.0018 model using the Adam optimizer for 70 epochs. For models with other λ values, we fine-tune the model trained with λ = 0.0018 for an additional 50 epochs. More details are provided in the Appendix Sec. A.8. Table 11: Training Hyperparameters. Training set COCO 2017 train # images 118,287 Image size Around 640x420 Data augmentation Crop, h-flip Train input size 256x256 Optimizer Adam Learning rate 1 10 4 LR schedule Reduce LROn Plateau LR schedule parameters factor: 0.5, patience: 5 Batch size 16 Epochs (SGD) 120 (λ =0.0018), 80 (others) Epochs (Proposed) 70 (λ =0.0018), 50 (others) Gradient clip 2.0 GPUs 1 RTX 4090 Predefined epoch F 20 (λ =0.0018), 10 (others) Embeddable parameters 75% Embedding period L 1 True embedding percentage P 1% Dummy embedding percentage min t 2%, 25% , where t is the current epoch Moving average factors α 0.8 Optimizer step l 5 |