Capsule Network Projectors are Equivariant and Invariant Learners
Authors: Miles Everett, Aiden Durrant, Mingjun Zhong, Georgios Leontidis
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experimentation", "4.1 Training Protocol", "4.2 Downstream Evaluation", "4.3 Quantitative Evaluation of Equivariance", "4.4 Number of Capsules", "Table 1: Evaluation of invariant properties via downstream classification task.", "Table 2: Evaluation of equivariant properties via downstream rotation prediction (left) and colour prediction (right) tasks.", "Table 3: Quantitative evaluation of the predictor when using a Capsule network projector, using PRE, MRR and H@k.", "Table 4: Evaluation on a subset of Objaverse-LVIS using a Res Net-18 backbone.". |
| Researcher Affiliation | Academia | Miles Everett EMAIL Department of Computing Science University of Aberdeen, UK", "Aiden Durrant EMAIL Department of Computing Science University of Aberdeen, UK", "Mingjun Zhong EMAIL Department of Computing Science University of Aberdeen, UK", "Georgios Leontidis EMAIL Interdisciplinary Institute Department of Computing Science University of Aberdeen, UK |
| Pseudocode | No | The paper describes methods and equations (e.g., Equation 1-8) but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/Aberdeen ML/Caps IE. |
| Open Datasets | Yes | Our results demonstrate the ability of Caps Nets to learn complex and generalised representations for large-scale, multi-task datasets compared to previous Caps Net benchmarks. Code is available at https://github.com/Aberdeen ML/Caps IE. [...] our study on the challenging 3DIEBench dataset and the corresponding problem definition presented in Garrido et al. (2023). [...] The full dataset and splits employed can be found at https://github.com/facebookresearch/SIE [...] Datasets such as Objaverse (Deitke et al., 2023) contain real-world scans [...] evaluate our approach on the Multi-Object Video (MOVi-E) dataset Greff et al. (2022). |
| Dataset Splits | Yes | All the results evaluated by the aforementioned metrics are given in Table 3. Our Caps IE network outperforms Equi Mod, Only Equivariance and SIE by a considerable margin across all metrics and for all dataset splits. [...] The full dataset and splits employed can be found at https://github.com/facebookresearch/SIE [...] Table 3: Quantitative evaluation of the predictor when using a Capsule network projector, using PRE, MRR and H@k. The source dataset, for which embeddings are computed, and the dataset used for retrieval are given in the format source-retrieval for PRE and source for MRR and H@k. Here source refers to the set from which embeddings are computed (train or val), while retrieval corresponds to the set used for comparison/retrieval (train, val, or all = train + val). |
| Hardware Specification | Yes | Each self-supervised 2000-epoch pretraining run took approximately 22 hours using three Nvidia A100 80GB GPUs for the 32 capule model, whereas the 64 capsule models, and required approximately 25 hours using six Nvidia A100 80GB GPUs. For comparison SIE training took approximately 26 hours using three Nvidia A100 80GB GPUs. All evaluation tasks are completed on a single Nvidia A100 80GB GPU |
| Software Dependencies | No | The paper mentions the use of the Adam optimizer but does not specify its version or any other software dependencies with version numbers. |
| Experiment Setup | Yes | All methods employ a Res Net-18 encoder network (fθ). For the projection head (hϕ), we compare various hyperparameterisations... For primary benchmarking we train our model for 2000 epochs using the Adam Kingma & Ba (2014) optimiser with default settings, a fixed learning rate of 1e-3 and a batch size of 1024. For ablations and sensitivity analyses we train for 500 epochs and employ a batch size of 512, with other settings remaining unchanged. We have found in practice that 500 epochs presents a strong correlation with performance. For all evaluations, pre-training was done with the equivariant criterion optimising for viewpoint rotation transformations. Full details on these transformation groups and the criteria are given in prior sections. By default the objective function weighting are as follows, λinv = 0.1, λequi = 5, λV = 10, λC = 1. [...] Table 7: Training settings for our evaluations. Settings are the same for all number of capsules. [...] Table 8: Training settings for our supervised Self Routing Capsule Network model. |