Geometric Hyena Networks for Large-scale Equivariant Learning

Authors: Artem Moskalev, Mangal Prakash, Junjie Xu, Tianyu Cui, Rui Liao, Tommaso Mansi

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluated on all-atom property prediction of large RNA molecules and full protein molecular dynamics, Geometric Hyena outperforms existing equivariant models while requiring significantly less memory and compute that equivariant self-attention. Notably, our model processes the geometric context of 30k tokens 20 faster than the equivariant transformer and allows 72 longer context within the same budget.
Researcher Affiliation Industry Artem Moskalev 1 Mangal Prakash 1 Junjie Xu 1 2 3 Tianyu Cui 1 Rui Liao 1 Tommaso Mansi 1 1Johnson and Johnson Innovative Medicine
Pseudocode Yes Code 1. Pytorch implementation of the equivariant vector long convolution.
Open Source Code No No explicit statement about open-sourcing the code or a repository link for the Geometric Hyena implementation is provided in the paper. Code 1 shows an implementation snippet, but it is not a declaration of open-source availability for the full methodology.
Open Datasets Yes These include Open Vaccine (Das et al., 2020) and Ribonanza-2k (He et al., 2024) datasets for stability and degradation prediction where our model outperforms other state-of-the-art equivariant baselines by up to 15%. Moreover, it achieves a 6% improvement on the m RNA switching-factor prediction task (Groher et al., 2018) and outperforms other equivariant methods by 9% on all-atom protein molecular dynamics prediction.
Dataset Splits Yes For each experiment, we sample 2600 sequences for training, and 200 sequences each for validation and testing.
Hardware Specification Yes We record all runtimes on NVIDIA A10G GPU with CUDA 12.2.
Software Dependencies Yes We record all runtimes on NVIDIA A10G GPU with CUDA 12.2.
Experiment Setup Yes We train all models for 400 epochs with a batch size of 8. We employ Adam optimizer (Kingma & Ba, 2014) with an initial learning rate of 0.001 and cosine learning rate scheduler (Loshchilov & Hutter, 2016) with 10 epochs of linear warm-up. The weight decay is set to 0.00001.