reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SC2 Benchmark: Supervised Compression for Split Computing

Authors: Yoshitomo Matsubara, Ruihan Yang, Marco Levorato, Stephan Mandt

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This study introduces supervised compression for split computing (SC2) and proposes new evaluation criteria: minimizing computation on the mobile device, minimizing transmitted data size, and maximizing model accuracy. We conduct a comprehensive benchmark study using 10 baseline methods, three computer vision tasks, and over 180 trained models, and discuss various aspects of SC2.
Researcher Affiliation	Academia	Yoshitomo Matsubara EMAIL Department of Computer Science University of California, Irvine Ruihan Yang EMAIL Department of Computer Science University of California, Irvine Marco Levorato EMAIL Department of Computer Science University of California, Irvine Stephan Mandt EMAIL Departments of Computer Science and Statistics University of California, Irvine
Pseudocode	No	The paper describes methods and processes in detail, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format.
Open Source Code	Yes	We also release our code1 and sc2bench,2 a Python package for future research on SC2. Our proposed metrics and package will help researchers better understand the tradeoffs of supervised compression in split computing. 1https://github.com/yoshitomo-matsubara/sc2-benchmark 2https://pypi.org/project/sc2bench/
Open Datasets	Yes	We use image data with relatively high resolution, including Image Net (ILSVRC 2012) (Russakovsky et al., 2015), COCO 2017 (Lin et al., 2014), and PASCAL VOC 2012 datasets (Everingham et al., 2012).
Dataset Splits	Yes	Specifically, we use Image Net (ILSVRC 2012) (Russakovsky et al., 2015), that consists of 1.28 million training and 50,000 validation samples... For object detection, we use the COCO 2017 dataset (Lin et al., 2014) to fine-tune the models. The training and validation splits in the COCO 2017 dataset have 118,287 and 5,000 annotated images, respectively. For semantic segmentation, we use the PASCAL VOC 2012 dataset (Everingham et al., 2012) with 1,464 and 1,449 samples for training and validation splits, respectively.
Hardware Specification	No	The paper discusses FLOPS and MAC as proxies for computing cost but explicitly states these are not well-defined or static values. While Appendix E shows results using 'encoder FLOPS approximated by Py Torch Profiler,' it also notes, 'Note that the reported FLOPS do not cover all the operations in the encoders as Py Torch Profiler supports only specific operations such as matrix multiplication and 2D convolution at the time of writing.' There is no specific mention of the hardware (e.g., GPU/CPU models, memory) used for conducting the experiments.
Software Dependencies	No	This Python package is built on Py Torch (Paszke et al., 2019) and torchdistill (Matsubara, 2021) for reproducible SC2 studies, using Compress AI (Bégaint et al., 2020) and Py Torch Image Models (Wightman, 2019) for neural compression modules/models and reference models, respectively. The paper lists several software packages and their foundational papers, but it does not specify the exact version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup	Yes	Using the Adam optimizer (Kingma & Ba, 2015), we train the student model on the Image Net dataset for 20 epochs with the batch size of 32. The initial learning rate is set to 10-3 and reduced by a factor of 10 at the end of the 5th, 10th, and 15th epochs... We set the training batch size to 2 and 8 for object detection and semantic segmentation tasks, respectively. The learning rate is reduced by a factor of 10 at the end of the 5th and 15th epochs... At the 1st stage, we train the student model for 10 epochs... We use Adam optimizer with batch size of 64 and an initial learning rate of 10-3. The learning rate is decreased by a factor of 10 after the end of the 5th and 8th epochs. Once we finish the 1st stage... we use the stochastic gradient descent (SGD) optimizer with an initial learning rate of 10-3, momentum of 0.9, and weight decay of 5e-4. We reduce the learning rate by a factor of 10 after the end of the 5th epoch, and the training batch size is set to 128. The balancing weight α and temperature τ for knowledge distillation are set to 0.5 and 1, respectively.