reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CALLIC: Content Adaptive Learning for Lossless Image Compression

Authors: Daxin Li, Yuanchao Bai, Kai Wang, Junjun Jiang, Xianming Liu, Wen Gao

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments Experimental Settings For pretraining, we make a collection of 3450 images from publicly available high-resolution datasets: DIV2K (Agustsson and Timofte 2017), with 800 training images, and Flickr2K (Lim et al. 2017), with 2650 training images. We crop these images into non-overlapping patches of size 64 64, resulting in a training set of 612, 806 images. We train MGCF for 2M steps using the Adam optimizer with minibatches of size 32 and learning rate 5e 4. Adaptation Settings The maximum number of optimization steps is set to T = 50, with a learning rate 1e 2. A relatively small scale for the prior weight distribution is chosen, set as s = 0.05. Quantization width is set to w = 0.05. For RPFT, we choose b = 0.2, d = 0.1, e = 1 as default. Evaluation Settings We selected five high-resolution datasets: Kodak (Kodak 1993), WHU-RS19 validation (Xia et al. 2010), Histo24 (Bai et al. 2024), DIV2K (Agustsson and Timofte 2017) and CLIC2020 professional validation (CLIC.p) (Toderici et al. 2020) as our evaluation datasets. The Kodak, DIV2K, CLIC.p, and CLIC.m are natural images datasets. Histo24 is a dataset that includes 24 768 512 histological images proposed by (Bai et al. 2024). We center-cropped 190 satellite images to 576 576 from the WHU-RS19 validation set, which were exported from Google Earth, to form our validation dataset, RS19. Our method is compared with traditional lossless codecs, including JPEG2000 (Skodras, Christopoulos, and Ebrahimi 2001), FLIF (Sneyers and Wuille 2016), and JPEG-XL (Alakuijala et al. 2019), and open-sourced learned methods like L3C (Mentzer et al. 2019), RC (Mentzer, Gool, and Tschannen 2020), i VPF (Zhang et al. 2021b), LCFDNet (Rhee et al. 2022), DLPR (Bai et al. 2024) and Ar IBBPS (Zhang et al. 2024). Experimental Results Model Performance The compression performance results are summarized in Tab. 1.
Researcher Affiliation	Academia	Daxin Li1*, Yuanchao Bai1 , Kai Wang1, Junjun Jiang1, Xianming Liu1 , Wen Gao2 1Faculty of Computing, Harbin Institute of Technology, Harbin 2Department of Computer Science and Technology, Peking University, Beijing EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the CALLIC method, MGCF, CCI, and RPFT in detail using descriptive text and mathematical formulations (e.g., equations 5, 6, 7, 9, and the function for F(t)), but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code for the described methodology, nor does it include a link to a code repository.
Open Datasets	Yes	For pretraining, we make a collection of 3450 images from publicly available high-resolution datasets: DIV2K (Agustsson and Timofte 2017), with 800 training images, and Flickr2K (Lim et al. 2017), with 2650 training images. ... We selected five high-resolution datasets: Kodak (Kodak 1993), WHU-RS19 validation (Xia et al. 2010), Histo24 (Bai et al. 2024), DIV2K (Agustsson and Timofte 2017) and CLIC2020 professional validation (CLIC.p) (Toderici et al. 2020) as our evaluation datasets.
Dataset Splits	Yes	For pretraining, we make a collection of 3450 images from publicly available high-resolution datasets: DIV2K (Agustsson and Timofte 2017), with 800 training images, and Flickr2K (Lim et al. 2017), with 2650 training images. We crop these images into non-overlapping patches of size 64 64, resulting in a training set of 612, 806 images. ... Evaluation Settings We selected five high-resolution datasets: Kodak (Kodak 1993), WHU-RS19 validation (Xia et al. 2010), Histo24 (Bai et al. 2024), DIV2K (Agustsson and Timofte 2017) and CLIC2020 professional validation (CLIC.p) (Toderici et al. 2020) as our evaluation datasets. ... We center-cropped 190 satellite images to 576 576 from the WHU-RS19 validation set, which were exported from Google Earth, to form our validation dataset, RS19.
Hardware Specification	No	The paper provides runtime measurements in Table 2 (e.g., 'Enc. Time', 'Dec. Time'), but it does not specify any details about the hardware (e.g., GPU models, CPU types) on which these experiments were conducted.
Software Dependencies	No	The paper mentions the use of the Adam optimizer and refers to specific techniques like mixed quantization and straight-through estimator (STE) but does not provide any specific software names with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup	Yes	Pre-trained Settings For pretraining... We train MGCF for 2M steps using the Adam optimizer with minibatches of size 32 and learning rate 5e 4. Adaptation Settings The maximum number of optimization steps is set to T = 50, with a learning rate 1e 2. A relatively small scale for the prior weight distribution is chosen, set as s = 0.05. Quantization width is set to w = 0.05. For RPFT, we choose b = 0.2, d = 0.1, e = 1 as default.