Entroformer: A Transformer-based Entropy Model for Learned Image Compression
Authors: Yichen Qian, Xiuyu Sun, Ming Lin, Zhiyu Tan, Rong Jin
ICLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTAL RESULTS We evaluate the effects of our transformer-baed entropy model by calculating the rate distortion (RD) performance. Figure 5 shows the RD curves over the publicly available Kodak dataset (Kodak, 1993) by using peak signal-to-noise ratio (PSNR) as the image quality metric. As shown in the left part, our Entroformer with joint the hyperprior module and the context module outperforms the state-of-the-art CNNs methods by 5.2% and the BPG by 20.5% at low bit rates. |
| Researcher Affiliation | Industry | Yichen Qian Ming Lin Xiuyu Sun Alibaba Group Alibaba Group Alibaba Group Hangzhou, China Bellevue, WA, 98004, USA Hangzhou, China EMAIL EMAIL EMAIL Zhiyu Tan Rong Jin Alibaba Group Alibaba Group Hangzhou, China Bellevue, WA, 98004, USA EMAIL EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/damo-cv/entroformer. |
| Open Datasets | Yes | We choose 14886 images from Open Image (Krasin et al., 2017) as our training data. |
| Dataset Splits | No | The paper specifies training data and a test set (Kodak dataset) but does not explicitly detail the use or splitting of a separate validation set. |
| Hardware Specification | Yes | All models are trained for 300 epochs with a batchsize of 16 and a patch size of 384 384 on 16GB Tesla V100 GPU card. |
| Software Dependencies | No | The paper mentions 'Py Torch(Paszke et al., 2019)' but does not specify a version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We use the Adam optimizer (Kingma & Ba, 2014) with β1 = 0.9, β2 = 0.999, ϵ = 1 10 8, and base learning rate= 1 10 4. When training transformers, it is standard practice to use a warmup phase at the beginning of learning, during which the learning rate increases from zero to its peak value (Vaswani et al., 2017). We use a warmup with 0.05 proportion of the total epochs. And then the learning rate decays stepwise for every 1/5 proportion epochs by a factor of 0.75. Gradient clipping is also helpful in the compression setup, which is set to 1.0. All models are trained for 300 epochs with a batchsize of 16 and a patch size of 384 384 on 16GB Tesla V100 GPU card. |