LieRE: Lie Rotational Positional Encodings
Authors: Sophie Ostmeier, Brian Axelrod, Maya Varma, Michael Moseley, Akshay S Chaudhari, Curtis Langlotz
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of Lie RE on 2D and 3D vision tasks, showing that it generalizes well to higher input resolutions while maintaining computational efficiency. The code and checkpoints are publicly available at https://github.com/Stanford MIMI/Lie RE. To assess the impact of Lie RE and other positional encodings on Vi T performance, we evaluate several encoding schemes across diverse tasks, including 2D and 3D image classification. Additionally, we investigate a fundamental spatial reasoning task where models must identify where an arrow points. Our experiments reveal that successful completion of this task specifically requires relative position encodings, highlighting their crucial role in spatial understanding. |
| Researcher Affiliation | Academia | 1Computer Science Department, Stanford University, USA 2Radiology Department, Stanford University, USA. Correspondence to: Sophie Ostmeier <EMAIL>, Brian Axelrod <EMAIL>. |
| Pseudocode | Yes | We include the pseudocode for the Lie RE attention in Algorithm 3a in addition to the standard Ro PE attention (Algorithm 3b). Figure 3. Comparison of the Lie RE and Ro PE-Mixed attention mechanisms. Algorithm 1 Lie RE Attention ... Algorithm 2 Ro PE Attention |
| Open Source Code | Yes | The code and checkpoints are publicly available at https://github.com/ Stanford MIMI/Lie RE. |
| Open Datasets | Yes | We begin with CIFAR-100 and Image Net-1k benchmarks to evaluate Lie RE in 2D vision tasks. To assess Lie RE s performance on 3D data, we use the UCF101 video classification benchmark (Soomro et al., 2012). |
| Dataset Splits | Yes | To evaluate robustness in low-data regimes, we perform a data ablation study. Figure 4 shows that Lie RE variants and Ro PE-Mixed maintain significantly higher accuracy than baselines when training on only 20 90% of the CIFAR-100 dataset. We train the models on 800,000 examples and observe that they generally converge after the first 400,000 examples. We evaluate the accuracy on the Image Net validation set with varying inference resolutions. |
| Hardware Specification | Yes | The CIFAR experiments where trained on 8x L4 GPUs with 24GB of VRAM each and all took under 30 minutes to complete. The basis capacity scaling experiment was conducted using RTX6000 GPUs. The Image Net experiments were trained on 8x L40 GPUs and all took less than 2 days and 5 hours of runtime... The 3D classification experiments were conducted on either 8 A100 40GB GPUs or 4 A100 80GB GPUs... |
| Software Dependencies | No | The paper mentions 'pytorch lightning framework for all experiments (Falcon, 2019)' and 'ADAM optimizer' and 'Rand Augment (Cubuk et al., 2020)', but it does not provide specific version numbers for these software components or any other libraries like PyTorch, Python, or CUDA. |
| Experiment Setup | Yes | The backbone for all experiments is configured as Vi T-B, with 12 layers, a hidden dimension of 768, and an intermediate dimension of 3096. We use a dropout of 0.1. We use a cosine learning rate schedule with an initial learning rate of 1E 4 and train for 200 epochs. We use an effective batch size of 512. We use a patch size of 4 4 on the original 32 32 image for CIFAR-100 and a patch size of 16 16 on the randomly cropped and resized 224 224 image. We use the ADAM optimizer with betas of 0.9 and 0.999 and ϵ = 1e 8. |