AlgoFormer: An Efficient Transformer Framework with Algorithmic Structures

Authors: Yihang Gao, Chuanyang Zheng, Enze Xie, Han Shi, Tianyang Hu, Yu Li, Michael Ng, Zhenguo Li, Zhaoqiang Liu

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct a comprehensive empirical evaluation of the performance of Algo Former in tackling challenging tasks, specifically addressing regression with representation, AR(q) with representation, and Co T with MLPs, as outlined in Section 2. Additionally, we also implement Algo Former on the neural machine translation of German and English and AG News classification, demonstrating its expressiveness and effectiveness in real-world language tasks. Figure 2 illustrates the validation error trends, showcasing a decrease with an increasing number of in-context samples, aligning with our intuition. Crucially, the Algo Former consistently outperforms both the standard and the vanilla looped transformer across all tasks, highlighting its superior expressiveness in algorithm learning.
Researcher Affiliation Collaboration 1National University of Singapore 2Chinese University of Hong Kong 3Huawei Noah s Ark Lab 4Hong Kong Baptist University 5University of Electronic Science and Technology of China
Pseudocode No The paper describes algorithms and methods using mathematical formulations and descriptive text, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes Specifically, we focus on Neural Machine Translation using the IWSLT 2015 German-English dataset. We also implement the proposed Algo Former on the text classification task using various datasets (AG News, IMDB, DBPedia, Yelp Review, and Yahoo News).
Dataset Splits No The paper states 'For synthetic tasks, we utilize N = 40 in-context samples as input prompts', and mentions using datasets like IWSLT 2015 German-English, AG News, IMDB, etc., but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts) for any of these datasets.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies No The paper mentions using the 'Adam optimizer' but does not specify any software frameworks (e.g., PyTorch, TensorFlow) or their version numbers, nor any other specific software dependencies with versions.
Experiment Setup Yes In all experiments, we adopt the decoder-based Algo Former, standard transformer (GPT-2), and vanilla looped transformer Yang et al. (2024). For synthetic tasks, we utilize N = 40 in-context samples as input prompts and d = 20 dimensional vectors with D = 256 dimensional positional embeddings for all experiments. To ensure fairness in comparisons, all models are trained using the Adam optimizer, with a learning rate η = 1e 4 and a total of 500K iterations to ensure convergence. The standard transformer is designed to have L = 12 layers while pre-, looped and post-transformers are all implemented in one-layer. The default setting for the Algo Former, as well as the vanilla looped transformer, involves setting (T, T) = (20, 15).