Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
COIN++: Neural Compression Across Modalities
Authors: Emilien Dupont, Hrushikesh Loya, Milad Alizadeh, Adam Golinski, Yee Whye Teh, Arnaud Doucet
TMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate the feasibility of our method by compressing various data modalities, from images and audio to medical and climate data. |
| Researcher Affiliation | Academia | Emilien Dupont* EMAIL Hrushikesh Loya* EMAIL Milad Alizadeh EMAIL Adam Goliński EMAIL Yee Whye Teh EMAIL Arnaud Doucet EMAIL University of Oxford |
| Pseudocode | No | The paper describes the MAML inner and outer loop updates using mathematical equations (7), (8), (9), and (10), but does not present a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | The code to reproduce all experiments in the paper can be found at https://github.com/Emilien Dupont/coinpp. |
| Open Datasets | Yes | We evaluate COIN++ on four data modalities: images, audio, medical data and climate data. We use global temperature measurements from the ERA5 dataset (Hersbach et al., 2019)... The Kodak dataset (Kodak, 1991) contains 24 large scale images... We use random 32x32 patches from the Vimeo90k dataset (Xue et al., 2019)... To evaluate COIN++ on audio, we use the Libri Speech dataset (Panayotov et al., 2015)... Finally, we train our model on brain MRI scans from the Fast MRI dataset (Zbontar et al., 2018). |
| Dataset Splits | Yes | The resulting dataset contains 153,939 training images and 11,346 test images. (Vimeo90k, Section A.1) We then randomly split the filtered scans into a 565 training volumes and 212 testing volumes. (Fast MRI, Section A.2) The resulting dataset contains 12,096 grids of size 46x90, with 8510 training examples, 1166 validation examples and 2420 test examples. (ERA5, Section A.3) For training, we use the train-clean-100 split containing 28,539 examples and the test-clean split containing 2,620 examples. (Libri Speech, Section A.4) |
| Hardware Specification | Yes | We measure the encoding time of COIN and COIN++ on a 1080Ti GPU and the decoding time on a 2080Ti GPU. For BPG, we measured encoding time on an AMD Ryzen 5 3600 (12) at 3.600GHz with 32GB of RAM. |
| Software Dependencies | Yes | We implement all models in Py Torch (Paszke et al., 2019)... We use the JPEG implementation from Pillow version 8.1.0. We use the Open JPEG version 2.4.0 implementation of JPEG2000... We use BPG version 0.9.8... All autoencoder baselines were trained using the Compress AI implementations (Bégaint et al., 2020). We use the MP3 implementation from LAME version 3.100. |
| Experiment Setup | Yes | We use SGD for the inner loop with a learning rate of 1e-2 and Adam for the outer loop with a learning rate of 1e-6 or 3e-6. We normalize coordinates x to lie in [-1, 1] and features y to lie in [0, 1]. For all models, we set ω0 = 50 and used an inner learning rate of 1e-2, an outer learning rate of 3e-6 and batch size 64. All models were trained for 500 epochs (400k iterations). |