AlgoFormer: An Efficient Transformer Framework with Algorithmic Structures
Authors: Yihang Gao, Chuanyang Zheng, Enze Xie, Han Shi, Tianyang Hu, Yu Li, Michael Ng, Zhenguo Li, Zhaoqiang Liu
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct a comprehensive empirical evaluation of the performance of Algo Former in tackling challenging tasks, specifically addressing regression with representation, AR(q) with representation, and Co T with MLPs, as outlined in Section 2. Additionally, we also implement Algo Former on the neural machine translation of German and English and AG News classification, demonstrating its expressiveness and effectiveness in real-world language tasks. Figure 2 illustrates the validation error trends, showcasing a decrease with an increasing number of in-context samples, aligning with our intuition. Crucially, the Algo Former consistently outperforms both the standard and the vanilla looped transformer across all tasks, highlighting its superior expressiveness in algorithm learning. |
| Researcher Affiliation | Collaboration | 1National University of Singapore 2Chinese University of Hong Kong 3Huawei Noah s Ark Lab 4Hong Kong Baptist University 5University of Electronic Science and Technology of China |
| Pseudocode | No | The paper describes algorithms and methods using mathematical formulations and descriptive text, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Specifically, we focus on Neural Machine Translation using the IWSLT 2015 German-English dataset. We also implement the proposed Algo Former on the text classification task using various datasets (AG News, IMDB, DBPedia, Yelp Review, and Yahoo News). |
| Dataset Splits | No | The paper states 'For synthetic tasks, we utilize N = 40 in-context samples as input prompts', and mentions using datasets like IWSLT 2015 German-English, AG News, IMDB, etc., but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts) for any of these datasets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | No | The paper mentions using the 'Adam optimizer' but does not specify any software frameworks (e.g., PyTorch, TensorFlow) or their version numbers, nor any other specific software dependencies with versions. |
| Experiment Setup | Yes | In all experiments, we adopt the decoder-based Algo Former, standard transformer (GPT-2), and vanilla looped transformer Yang et al. (2024). For synthetic tasks, we utilize N = 40 in-context samples as input prompts and d = 20 dimensional vectors with D = 256 dimensional positional embeddings for all experiments. To ensure fairness in comparisons, all models are trained using the Adam optimizer, with a learning rate η = 1e 4 and a total of 500K iterations to ensure convergence. The standard transformer is designed to have L = 12 layers while pre-, looped and post-transformers are all implemented in one-layer. The default setting for the Algo Former, as well as the vanilla looped transformer, involves setting (T, T) = (20, 15). |