reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Tractable Transformers for Flexible Conditional Generation

Authors: Anji Liu, Xuejie Liu, Dayuan Zhao, Mathias Niepert, Yitao Liang, Guy Van Den Broeck

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results demonstrate that Tracformers achieve stateof-the-art conditional generation performance on text modeling compared to recent diffusion and AR model baselines. ... In this section, we aim to empirically evaluate Tracformer s effectiveness in both conditional and unconditional generation. Specifically, our experiments are designed to answer two key questions: (i) How does Tracformer compare to other NAR architectures in terms of conditional generation performance? (ii) Can Tracformer scale effectively and outperform existing So TA generative models in both conditional and unconditional tasks? To this end, we conduct two sets of experiments: In Section 6.1, we compare Tracformer with a range of NAR architectures on Wiki Text (Merity et al., 2022), LAMBADA (Paperno et al., 2016), and One Billion Words (1BW) (Chelba et al., 2013) datasets to evaluate its performance across diverse conditional queries. In Section 6.2, we scale Tracformer to Open Web Text (Gokaslan & Cohen, 2019) and benchmark it against So TA discrete diffusion models, focusing on zero-shot conditional and unconditional performance. These experiments comprehensively evaluate Tracformer s advantages and its potential to serve as a more effective backbone for NAR generation.
Researcher Affiliation	Academia	1Department of Computer Science, University of California, Los Angeles 2Institute for Artificial Intelligence, University of Stuttgart 3Institute for Artificial Intelligence, Peking University 4School of Intelligence Science and Technology, Peking University 5Yuanpei College, Peking University.
Pseudocode	Yes	Algorithm 1 Span Masking Strategy Algorithm 2 Mixed Masking Strategy for Open Web Text Training
Open Source Code	Yes	Code is available at https: //github.com/liuanji/Tracformer.
Open Datasets	Yes	Wiki Text103 (Merity et al., 2022), LAMBADA (Paperno et al., 2016), and One Billion Words (1BW) (Chelba et al., 2013) datasets... Open Web Text (Gokaslan & Cohen, 2019)
Dataset Splits	No	The paper mentions using
Hardware Specification	No	The paper does not provide specific details on the hardware used for the experiments, such as GPU or CPU models. It only states: "Due to resource limitations, we only train Tracformer at the GPT-2 (base) scale."
Software Dependencies	No	The paper does not explicitly state specific software dependencies with version numbers used for the experiments.
Experiment Setup	Yes	For both CAR and AC training tasks, the sequence length is set to 1024 tokens, with a batch size of 256. The models are optimized using Adam W with β1 = 0.9, β2 = 0.95, and a weight decay of 0.1. The initial learning rate is set to 6 10 4 and follows a cosine decay schedule, with 1,000 warmup steps to stabilize the early training phase. The final learning rate is 6 10 5. Training is conducted for 30,000 steps. ... Tracformer is implemented with a 10-layer encoder-decoder architecture, maintaining a block size (i.e., maximum sequence length) of 1024 tokens. It utilizes sparse multi-scope attention with a constraint of 16 attended tokens per step... The decoder operates with a maximum stride of 1024 tokens... The model is configured with 9 attention heads and an embedding dimension of 576. A dropout rate of 0.1 is applied...