Steerable Transformers for Volumetric Data
Authors: Soumyabrata Kundu, Risi Kondor
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the performance of our steerable transformer architecture in both two and three dimensions. Specifically, we focus on two vision tasks: image classification and semantic segmentation. In all experiments, we employed a hybrid architecture that integrates steerable convolutions with steerable transformer, and the trained using the Adam Optimizer (Kingma & Ba, 2014). [...] Table 1: Comparison of steerable transformers and steerable convolutions. The mean and sd are reported for 5 runs. For Model Net10 we have reported both the z rotation and SO(3) rotation variations. For PH2 dataset we have reported the dice score for segmentation of the binary mask. For the Bra TS dataset we have reported the dice score individually for each tumor category. |
| Researcher Affiliation | Academia | 1Department of Statistics, University of Chicago, Chicago, USA 2Department of Computer Science, University of Chicago, Chicago, USA. Correspondence to: Soumyabrata Kundu <EMAIL>. |
| Pseudocode | No | The paper describes the steerable self-attention mechanism and related components using mathematical equations and descriptive text, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code for all experiments is available at https://github.com/ Soumyabrata Kundu/Steerable-Transformer. |
| Open Datasets | Yes | Rotated MNIST, a variant of the original MNIST dataset (Le Cun et al., 2010)...; The Model Net10 dataset (Wu et al., 2015)... The point cloud version of the dataset available at https://github.com/ antao97/Point Cloud Datasets; The PH2 dataset (Mendonc a et al., 2013)... The data for the experiment is available at https://www.fc.up.pt/addi/ph2%20database.html.; The Brain Tumor Segmentation (Bra TS) dataset (Menze et al., 2015)... The data for the experiment is available at http://medicaldecathlon.com/. |
| Dataset Splits | Yes | Rotated MNIST: The dataset contains 12,000 training images and 50,000 testing images.; Model Net10: The Model Net10 dataset (Wu et al., 2015) consists of 3D CAD models from 10 common object categories, with a train/test split of 3991:908.; PH2: We randomly split the dataset into 100:50:50 for training, testing, and validation, respectively.; Bra TS: We used a train/validation/test split of 243:96:145. |
| Hardware Specification | Yes | A batch size of 25 was used, and training the largest model took 4 hours on a 16GB GPU.; A batch size of 5 was used, and training the largest model took 12 hours on a 16GB GPU.; We used a batch size of 1, and training the largest model took 2 hours on a 16GB GPU. A larger batch size could not be used due to out-of-memory errors.; A batch size of 1 was used, and training the largest model took 40 hours on a 16GB GPU. A larger batch size could not be used due to out-of-memory errors. |
| Software Dependencies | No | The networks were trained using the Adam optimizer (Kingma & Ba, 2014)... While an optimizer is mentioned, no specific software libraries with version numbers (e.g., PyTorch 1.x, Python 3.x) are provided, which are crucial for reproducibility. |
| Experiment Setup | Yes | The networks were trained using the Adam optimizer (Kingma & Ba, 2014), starting with a learning rate of 5 10 3, which was reduced by a factor of 0.5 every 20 epochs, along with a weight decay of 5 10 4 for 150 epochs. A batch size of 25 was used... (Similar detailed setups are provided for other datasets in Appendix C) |