reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Masked Capsule Autoencoders

Authors: Miles Everett, Mingjun Zhong, Georgios Leontidis

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Across several experiments and ablations studies we demonstrate that similarly to CNNs and Vi Ts, Capsule Networks can also benefit from self-supervised pretraining, paving the way for further advancements in this neural network domain. For instance, by pretraining on the Imagenette dataset consisting of 10 classes of Imagenet-sized images we achieve state-of-the-art results for Capsule Networks, demonstrating a 9% improvement compared to our baseline model.
Researcher Affiliation	Academia	Miles Everett EMAIL Department of Computing Science University of Aberdeen, UK
Pseudocode	No	The paper includes mathematical equations for the self-routing mechanism (Equations 1, 2, 3) and loss functions (Equations 4, 5), but does not present any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions using and citing third-party libraries like the Pytorch Image Library (TIMM) and FVCore for calculations, but it does not provide an explicit statement or link to the source code for the proposed Masked Capsule Autoencoders (MCAE) methodology.
Open Datasets	Yes	Initially, we provide a sanity check on the MNIST dataset (Le Cun et al., 2010), to provide quick experimentation to ensure that our methods work at all. Next, we use both the Fashion MNIST and CIFAR-10 datasets (Xiao et al., 2017; Krizhevsky et al., 2009)... The Small NORB dataset (Le Cun et al., 2004)... Finally, we use the Imagenette and Imagewoof datasets (Howard, 2019a;b) to test our network s performance on larger, more realistic datasets.
Dataset Splits	Yes	When a validation dataset has not been predefined, we randomly split 10% of the training dataset to act as our validation dataset. The best model is tested once on the test set of our datasets, with the best model being chosen based on the epoch with the lowest validation loss... The augmentations that we use for this dataset are that we standardise and take random 32x32 crops during training. At test time, we centre crop the images to 32x32 as defined in (Ribeiro et al., 2020)... 1) Training only on azimuths in (300, 320, 340, 0, 20, 40) and test on azimuths in the range of 60 to 280. 2) Training on the elevations in (30, 35, 40) degrees from horizontal and then testing on elevations in the range of 45 to 70 degrees.
Hardware Specification	No	The paper states: "We would like to thank the University of Aberdeen s High Performance Computing facility for enabling this work and the anonymous reviewers for their constructive feedback." This is a general mention of a computing facility but lacks specific hardware details like GPU models, CPU models, or memory.
Software Dependencies	No	The paper mentions using "the SGD optimizer" and "the cosine annealing learning rate scheduler" but does not specify their versions. It also refers to "Pytorch Image Library (TIMM) Wightman (2019)" and "FVCore library FAIR (2023)" with citations, but does not provide specific version numbers for these libraries or the underlying machine learning framework (e.g., PyTorch version).
Experiment Setup	Yes	All of our experiments follow the same experimental setup. This involves to optionally pretrain the network minus the class capsules for 50 epochs, with 50% of patches removed on either removed patch or whole image reconstruction as a target. We then add the class capsules to our network and fully finetune it for 350 epochs, following the supervised training settings of (Hahn et al., 2019; Everett et al., 2023)... All models use the SGD optimizer with default settings and the cosine annealing learning rate scheduler with a 0.1 initial learning rate.